Page 1 of 1

Understanding the callstack

Posted: Mon Sep 11, 2017 2:36 am
by GeoffW
Hi,

I've read enough to understand that the callstack provided may not be 100% correct, but I am wondering if there are any best guesses that can be made about things that seem wrong. Take this small part of a callstack.

Code: Select all

0353d446 +242 TestPackage.bpl  Testing.DataForm     1018 +53 TDataForm.DoRestartForm
776b068a +00a ntdll.dll                                      KiUserExceptionDispatcher
5006563c +20c rtl250.bpl       System                        DynArraySetLength
500656dd +005 rtl250.bpl       System                        @DynArraySetLength
50151e4a +02e rtl250.bpl       System                        Classes.TList.SetCapacity
50151bcb +02b rtl250.bpl       System                        Classes.TList.Grow
50151997 +013 rtl250.bpl       System                        Classes.TList.Add
0353d295 +091 TestPackage.bpl  Testing.DataForm      974  +9 TDataForm.DoRestartForm
0353cc0a +09a TestPackage.bpl  Testing.DataForm      778  +8 TDataForm.DoInitialise
The top line and the bottom two all make perfect sense (in themselves and across the rest of the callstack), and I can see how, if madExcept missed four or five steps, the code might have gotten from TDataForm.DoRestartForm to Classes.TList.Add. But this is just a guess.

Note: In this example the questionable items all happen to be in VCL or Windows (no line numbers) but this is not always the case.

My question is: Since the steps from Classes.TList.Add ... DynArraySetLength all seem to make sense in themselves (seems to be a valid and consistent chunk of callstack) is it reasonably safe to assume this chunk of the stack is probably correct (not noise); and so to also assume that madExcept did just miss the intervening steps from TDataForm.DoRestartForm to Classes.TList.Add ?

If it's not a safe assumption then there are so many other things it might have done that the particular callstack would seem to be not much use, and assuming it's good may just lead to me chasing the wrong problem. Is there any guidance you can offer?

Re: Understanding the callstack

Posted: Mon Sep 11, 2017 7:17 am
by madshi
The top item in the callstack is usually the most reliable one because that one is tied to the exception address, which is a known value. The rest of the callstack is educated guessing by madExcept (in 32bit more so than in 64bit where callstacks are more reliable).

Given the information above, the key question is: Which of the rest of the callstack might have been a precursor to "DoRestartForm" line 1018? Does DoRestartForm call itself recursively? If not, then the stack item with line 974 seems unlikely to be true.

Generally, madExcept disassembles a lot of your code to try to figure out which callstack items might form a "correct chunk of the stack", as you put it. Unfortunately, in 32bit the stack often has residual content from seconds, minutes or hours before, so there may be valid "callstack chunks" which are just outdated. So madExcept also tries to connect the dots where possible. Unfortunately sometimes the disassembly doesn't contain a clear CALL instruction with a clear target address, but instead does something like "mov eax, [edi + $20]", followed by "call eax". In such a case madExcept doesn't know which function the disassembly calls. In such a case it's all guess-work... :(

Re: Understanding the callstack

Posted: Mon Sep 11, 2017 1:09 pm
by GeoffW
Thanks. What you say makes sense - even if it's not exactly what I want to hear ;-)

DoRestartForm line 1018 is inside an except block (the line where the exception is handled). All items above this on the stack are consistent with the exception handling code. All stack items below line 974 are consistent with what I would expect to get to DoRestartForm. Line 974 is about 6 lines into the try block, so it might be correct, but only if madExcept has missed out on finding at least 4 items off the stack. Nothing in the try block directly attempts to add anything to a TList, but line 974 does lead to functions that would access a TList - though it's probably not the only line - hence my confusion over where the code is going wrong.

This is one of those problems I can't reproduce in development and have had to rely on what madExcept can tell me. It looks like madExcept isn't the only one stuck with some guesswork. Thanks for your response.

Re: Understanding the callstack

Posted: Mon Sep 11, 2017 1:25 pm
by madshi
So it seems that the code inside your try..except block crashes once more (line 1018). Of course that makes debugging so much more complicated. Do you really need that try..except block? If so, you should make an extra effort to avoid raising further crashes inside of your try..except block.

Everything in between 974 and 1018 might be incorrect, it's hard to say. A jump into a try..except block is impossible for madExcept's disassembler to follow, so madExcept can't properly connect the dots in situations like this.

Re: Understanding the callstack

Posted: Mon Sep 11, 2017 2:16 pm
by GeoffW
It's possible I am misunderstanding, or misusing your tools. It's also possible I've misled you by only posting a part of the call stack and not explaining how it was captured. Here is the callstack again showing from the top down to include what you've seen already.

Code: Select all

03185301 +0b1 TestPackage.bpl  Testing.Trace          336 +16 CaptureExceptionTrace
031888d2 +032 TestPackage.bpl  Testing.Trace         1003  +4 TranslateException
0317628b +03f TestPackage.bpl  Testing.Msgs           498  +5 TMsgEnh.Assign
0224725f +03f TestPackage.bpl  Testing.MessageForm2   879  +5 TMessageForm2.ShowExc
0359f114 +03c TestPackage.bpl  Testing.MainForm       550  +3 TMainForm.ShowExc
035b57f3 +027 TestPackage.bpl  Testing.BaseForm      1017  +4 TBaseForm.ShowExc
035dd446 +242 TestPackage.bpl  Testing.DataForm      1018 +53 TDataForm.DoRestartForm
776b068a +00a ntdll.dll                                       KiUserExceptionDispatcher
5006563c +20c rtl250.bpl       System                         DynArraySetLength
500656dd +005 rtl250.bpl       System                         @DynArraySetLength
50151e4a +02e rtl250.bpl       System                         Classes.TList.SetCapacity
035dd295 +091 TestPackage.bpl  Testing.DataForm       974  +9 TDataForm.DoRestartForm
035dcc0a +09a TestPackage.bpl  Testing.DataForm       778  +8 TDataForm.DoInitialise
That top line - CaptureExceptionTrace - calls NewException to collect the bug report which is shown in my custom dialog (and also, eventually, logged to the database). Because of this situation I thought it should be normal to show the callstack to include the except block as part of the stack (it seems to be the case).

Damn, that seems to be a different exception to the one I posted earlier - same problem but slightly different stack in the middle there.

Re: Understanding the callstack

Posted: Mon Sep 11, 2017 2:27 pm
by madshi
Well, the part between 974 and 1018 is different. But I already mentioned before that these lines are questionable, anyway. IMHO you should ignore those couple of lines and concentrate on figuring out how why line 974 seems to crash. That's the key problem. Everything "above" 974 is probably not too important.

Are you doing all this try..except stuff with the one and only purpose to replace the madExcept exception window with your own? Or is there another purpose of the try..except block? If you just want to replace the exception dialog, adding many many try..except blocks to all your code seems excessive to me. It would be much easier to replace the exception window in a simple "RegisterExceptionHandler()" exception callback. Then you could totally drop all those try..except blocks, shorten your code noticeably, and as a side effect also make the crash callstacks clearer.

Re: Understanding the callstack

Posted: Mon Sep 11, 2017 2:36 pm
by GeoffW
This application is almost 20 years old. madExcept was (regrettably) only added 18 months ago (or there about). So a lot of the structures were already in place for the old (much less sophisticated - home grown) exception reporting system - madExcept sort of got squeezed in to work with what was already in place. Also, in this particular instance I really do want to capture and handle exceptions - not let them escape the function.

Re: Understanding the callstack

Posted: Mon Sep 11, 2017 2:42 pm
by madshi
Alright, how does your NewException() call look like?

Re: Understanding the callstack

Posted: Mon Sep 11, 2017 2:46 pm
by GeoffW
Just:

Code: Select all

    ime := NewException;
    ime.ListThreads := false;
    ime.BugReport; // force the report to complete
I then do a bit of post processing on the report sections before display and logging.

Re: Understanding the callstack

Posted: Mon Sep 11, 2017 2:47 pm
by madshi
Try "NewException(etNormal)", or even "NewException(etNormal, ExceptObject, ExceptAddr)", but they will probably produce the same result.

Re: Understanding the callstack

Posted: Mon Sep 11, 2017 2:53 pm
by GeoffW
Thanks, I will give it a try. I can experiment with some forced exceptions to see how it looks; the real problem is one I can't reproduce in development and only happens occasionally in production, so it may take some time, even after my change gets into production, to see if it helps with this particular situation.

Re: Understanding the callstack

Posted: Mon Sep 11, 2017 3:27 pm
by GeoffW
I can offer some almost immediate (and positive) feedback. I used

Code: Select all

NewException(etNormal, ExceptObject, ExceptAddr)
and put a deliberate exception at line 975. The result placed this line right at the top of the reported stack:

Code: Select all

023fb564 +0a8 TestPackage.bpl Testing.DataForm     975  +10 TDataForm.DoRestartForm
Skipping all the unnecessary exception capture lines, and the confusing stuff.

So the resulting callstacks will not only be shorter, they will have less stuff in there to confuse me - and that's got to be a good thing. :-)

The original offending line 974 call is just a function call (to another method on the same form object), which has me a bit puzzled, but I will wait and see what I get from the changes so far before jumping to too many conclusions.

Re: Understanding the callstack

Posted: Mon Sep 11, 2017 3:59 pm
by madshi
The "new" callstacks should definitely be better, because with "etNormal" the callstack tracer runs in a special exception handling mode, making use of the known ExceptAddr.