Understanding the callstack

delphi package - automated exception handling
Post Reply
GeoffW
Posts: 7
Joined: Mon Sep 11, 2017 2:04 am

Understanding the callstack

Post by GeoffW »

Hi,

I've read enough to understand that the callstack provided may not be 100% correct, but I am wondering if there are any best guesses that can be made about things that seem wrong. Take this small part of a callstack.

Code: Select all

0353d446 +242 TestPackage.bpl  Testing.DataForm     1018 +53 TDataForm.DoRestartForm
776b068a +00a ntdll.dll                                      KiUserExceptionDispatcher
5006563c +20c rtl250.bpl       System                        DynArraySetLength
500656dd +005 rtl250.bpl       System                        @DynArraySetLength
50151e4a +02e rtl250.bpl       System                        Classes.TList.SetCapacity
50151bcb +02b rtl250.bpl       System                        Classes.TList.Grow
50151997 +013 rtl250.bpl       System                        Classes.TList.Add
0353d295 +091 TestPackage.bpl  Testing.DataForm      974  +9 TDataForm.DoRestartForm
0353cc0a +09a TestPackage.bpl  Testing.DataForm      778  +8 TDataForm.DoInitialise
The top line and the bottom two all make perfect sense (in themselves and across the rest of the callstack), and I can see how, if madExcept missed four or five steps, the code might have gotten from TDataForm.DoRestartForm to Classes.TList.Add. But this is just a guess.

Note: In this example the questionable items all happen to be in VCL or Windows (no line numbers) but this is not always the case.

My question is: Since the steps from Classes.TList.Add ... DynArraySetLength all seem to make sense in themselves (seems to be a valid and consistent chunk of callstack) is it reasonably safe to assume this chunk of the stack is probably correct (not noise); and so to also assume that madExcept did just miss the intervening steps from TDataForm.DoRestartForm to Classes.TList.Add ?

If it's not a safe assumption then there are so many other things it might have done that the particular callstack would seem to be not much use, and assuming it's good may just lead to me chasing the wrong problem. Is there any guidance you can offer?
madshi
Site Admin
Posts: 10753
Joined: Sun Mar 21, 2004 5:25 pm

Re: Understanding the callstack

Post by madshi »

The top item in the callstack is usually the most reliable one because that one is tied to the exception address, which is a known value. The rest of the callstack is educated guessing by madExcept (in 32bit more so than in 64bit where callstacks are more reliable).

Given the information above, the key question is: Which of the rest of the callstack might have been a precursor to "DoRestartForm" line 1018? Does DoRestartForm call itself recursively? If not, then the stack item with line 974 seems unlikely to be true.

Generally, madExcept disassembles a lot of your code to try to figure out which callstack items might form a "correct chunk of the stack", as you put it. Unfortunately, in 32bit the stack often has residual content from seconds, minutes or hours before, so there may be valid "callstack chunks" which are just outdated. So madExcept also tries to connect the dots where possible. Unfortunately sometimes the disassembly doesn't contain a clear CALL instruction with a clear target address, but instead does something like "mov eax, [edi + $20]", followed by "call eax". In such a case madExcept doesn't know which function the disassembly calls. In such a case it's all guess-work... :(
GeoffW
Posts: 7
Joined: Mon Sep 11, 2017 2:04 am

Re: Understanding the callstack

Post by GeoffW »

Thanks. What you say makes sense - even if it's not exactly what I want to hear ;-)

DoRestartForm line 1018 is inside an except block (the line where the exception is handled). All items above this on the stack are consistent with the exception handling code. All stack items below line 974 are consistent with what I would expect to get to DoRestartForm. Line 974 is about 6 lines into the try block, so it might be correct, but only if madExcept has missed out on finding at least 4 items off the stack. Nothing in the try block directly attempts to add anything to a TList, but line 974 does lead to functions that would access a TList - though it's probably not the only line - hence my confusion over where the code is going wrong.

This is one of those problems I can't reproduce in development and have had to rely on what madExcept can tell me. It looks like madExcept isn't the only one stuck with some guesswork. Thanks for your response.
madshi
Site Admin
Posts: 10753
Joined: Sun Mar 21, 2004 5:25 pm

Re: Understanding the callstack

Post by madshi »

So it seems that the code inside your try..except block crashes once more (line 1018). Of course that makes debugging so much more complicated. Do you really need that try..except block? If so, you should make an extra effort to avoid raising further crashes inside of your try..except block.

Everything in between 974 and 1018 might be incorrect, it's hard to say. A jump into a try..except block is impossible for madExcept's disassembler to follow, so madExcept can't properly connect the dots in situations like this.
GeoffW
Posts: 7
Joined: Mon Sep 11, 2017 2:04 am

Re: Understanding the callstack

Post by GeoffW »

It's possible I am misunderstanding, or misusing your tools. It's also possible I've misled you by only posting a part of the call stack and not explaining how it was captured. Here is the callstack again showing from the top down to include what you've seen already.

Code: Select all

03185301 +0b1 TestPackage.bpl  Testing.Trace          336 +16 CaptureExceptionTrace
031888d2 +032 TestPackage.bpl  Testing.Trace         1003  +4 TranslateException
0317628b +03f TestPackage.bpl  Testing.Msgs           498  +5 TMsgEnh.Assign
0224725f +03f TestPackage.bpl  Testing.MessageForm2   879  +5 TMessageForm2.ShowExc
0359f114 +03c TestPackage.bpl  Testing.MainForm       550  +3 TMainForm.ShowExc
035b57f3 +027 TestPackage.bpl  Testing.BaseForm      1017  +4 TBaseForm.ShowExc
035dd446 +242 TestPackage.bpl  Testing.DataForm      1018 +53 TDataForm.DoRestartForm
776b068a +00a ntdll.dll                                       KiUserExceptionDispatcher
5006563c +20c rtl250.bpl       System                         DynArraySetLength
500656dd +005 rtl250.bpl       System                         @DynArraySetLength
50151e4a +02e rtl250.bpl       System                         Classes.TList.SetCapacity
035dd295 +091 TestPackage.bpl  Testing.DataForm       974  +9 TDataForm.DoRestartForm
035dcc0a +09a TestPackage.bpl  Testing.DataForm       778  +8 TDataForm.DoInitialise
That top line - CaptureExceptionTrace - calls NewException to collect the bug report which is shown in my custom dialog (and also, eventually, logged to the database). Because of this situation I thought it should be normal to show the callstack to include the except block as part of the stack (it seems to be the case).

Damn, that seems to be a different exception to the one I posted earlier - same problem but slightly different stack in the middle there.
madshi
Site Admin
Posts: 10753
Joined: Sun Mar 21, 2004 5:25 pm

Re: Understanding the callstack

Post by madshi »

Well, the part between 974 and 1018 is different. But I already mentioned before that these lines are questionable, anyway. IMHO you should ignore those couple of lines and concentrate on figuring out how why line 974 seems to crash. That's the key problem. Everything "above" 974 is probably not too important.

Are you doing all this try..except stuff with the one and only purpose to replace the madExcept exception window with your own? Or is there another purpose of the try..except block? If you just want to replace the exception dialog, adding many many try..except blocks to all your code seems excessive to me. It would be much easier to replace the exception window in a simple "RegisterExceptionHandler()" exception callback. Then you could totally drop all those try..except blocks, shorten your code noticeably, and as a side effect also make the crash callstacks clearer.
GeoffW
Posts: 7
Joined: Mon Sep 11, 2017 2:04 am

Re: Understanding the callstack

Post by GeoffW »

This application is almost 20 years old. madExcept was (regrettably) only added 18 months ago (or there about). So a lot of the structures were already in place for the old (much less sophisticated - home grown) exception reporting system - madExcept sort of got squeezed in to work with what was already in place. Also, in this particular instance I really do want to capture and handle exceptions - not let them escape the function.
madshi
Site Admin
Posts: 10753
Joined: Sun Mar 21, 2004 5:25 pm

Re: Understanding the callstack

Post by madshi »

Alright, how does your NewException() call look like?
GeoffW
Posts: 7
Joined: Mon Sep 11, 2017 2:04 am

Re: Understanding the callstack

Post by GeoffW »

Just:

Code: Select all

    ime := NewException;
    ime.ListThreads := false;
    ime.BugReport; // force the report to complete
I then do a bit of post processing on the report sections before display and logging.
madshi
Site Admin
Posts: 10753
Joined: Sun Mar 21, 2004 5:25 pm

Re: Understanding the callstack

Post by madshi »

Try "NewException(etNormal)", or even "NewException(etNormal, ExceptObject, ExceptAddr)", but they will probably produce the same result.
GeoffW
Posts: 7
Joined: Mon Sep 11, 2017 2:04 am

Re: Understanding the callstack

Post by GeoffW »

Thanks, I will give it a try. I can experiment with some forced exceptions to see how it looks; the real problem is one I can't reproduce in development and only happens occasionally in production, so it may take some time, even after my change gets into production, to see if it helps with this particular situation.
GeoffW
Posts: 7
Joined: Mon Sep 11, 2017 2:04 am

Re: Understanding the callstack

Post by GeoffW »

I can offer some almost immediate (and positive) feedback. I used

Code: Select all

NewException(etNormal, ExceptObject, ExceptAddr)
and put a deliberate exception at line 975. The result placed this line right at the top of the reported stack:

Code: Select all

023fb564 +0a8 TestPackage.bpl Testing.DataForm     975  +10 TDataForm.DoRestartForm
Skipping all the unnecessary exception capture lines, and the confusing stuff.

So the resulting callstacks will not only be shorter, they will have less stuff in there to confuse me - and that's got to be a good thing. :-)

The original offending line 974 call is just a function call (to another method on the same form object), which has me a bit puzzled, but I will wait and see what I get from the changes so far before jumping to too many conclusions.
madshi
Site Admin
Posts: 10753
Joined: Sun Mar 21, 2004 5:25 pm

Re: Understanding the callstack

Post by madshi »

The "new" callstacks should definitely be better, because with "etNormal" the callstack tracer runs in a special exception handling mode, making use of the known ExceptAddr.
Post Reply