Page 1 of 1

Infinite loop in madStackTrace.CollectPossibleStackItems x64

Posted: Fri Jul 24, 2020 8:45 pm
by santiago
Hello madschi,

We have been able to trace a problem that had been driving us crazy for quite a while back to madExcept.

Problems were being reported by many customers, and oddly enough no exception logs were available.
All the problems involved inter process communication.

Our App is built with Delphi but we use an ever growing amount of .net plugins. We use Hydra from RemObjects for communication between the delphi host and the .net plugins.

The problem is that in very special conditions when handling an exception madExcept will remain in an endless loop. The madExcept error report pops up and reports 'callstack will be calculated' soon.
madExcept_callstack.jpg (23.41 KiB) Viewed 6222 times
Hitting the 'End program' or 'Restart program' buttons has no effect and the process must be killed.
Hitting the 'Continue' button makes the dialog go away but code is no longer executed and the process must be killed.

I have found a very specific set of steps, which lead to a 100% reproducible case. This error happens always.

What is very odd is that if I do comparable steps with other parts of the program, the problem does not ocurr.

Now what is so special/different/unique about that specific part of the code/program?
After spending a LOT of time on this I was not able to identify anything that stands out about this part of the code.

My goal was to be able to create a sample project for you so you can more easily troubleshoot the issue.
So I fear this will not be possible. :-(

But there are some better news following.

Here is what I do know.

A .net plugin receives a TCP message from another process.
The .net plugin uses Hydra to call some Delphi code on the Delphi host app.
An exception is thrown while the Delphi code is executing. As a test I am raising an exception as soon as the delphi code is executed.
madExcept handles the exception and gets stuck in an infinite loop while trying to obtain the callstack.

Strangely, not all TCP messages cause this problem. It is only with specific messages that the problem happens.
But it can be reproduced 100% reliably if following the 'proper' steps.

The problem only happens if the .net plugin is compiled in release mode. Callstack is obtained just fine when compiling the .net plugin in debug mode.
Problem happens only with 64 bit.

Since I was getting nowhere in trying to create a sample project for you. I decided to add the madExcept source files to our project instead of using the bpl's.
I defined debugMadExcept and I was then able to debug the madExcept code. :-)

What I found is that an endless loop is ocurring during CollectPossibleStackItems inside the repeat-until loop at the following location:
infiniteLoop.jpg (204.24 KiB) Viewed 6222 times
I see there is a count variable which is sometimes incremented during the loop.

As a test I added a third condition to line 1307 to look like this:

Code: Select all

until (sf.AddrReturn.Offset = 0) or (rc > 1000) or (count > 1000);
This will break out of the loop when count is bigger than 1000.

Doing this no longer causes the program to freeze when the madExcept dialog shows.
Also the callstack is displayed correctly.
It consists of only one entry (since the code calling the delphi function was .net code (via COM)).

Please let me know if I can provide you with any other information.

If it helps I would be happy to do a remote debugging session with you, so you can make/use the debugger and troubleshoot the problem on my computer.

In case this helps, I saved to a txt file the current values of sf.AddrReturn.Offset and rc. This was done just above the until statement.

Code: Select all

DebugInfo.Add(Format('Offset: %d. rc: %d', [sf.AddrReturn.Offset, rc]));
Please refer to the contents of the file

Thank you!

Re: Infinite loop in madStackTrace.CollectPossibleStackItems

Posted: Mon Jul 27, 2020 1:19 pm
by santiago
Hello madschi,

some more info that could be of use for you.
I had forgotten to mention that I am using the latest madExcept version 5.1.

The following screenshot shows how the callstack is displayed in 64 bits. Is using the workaround I added to break out of the infinite loop.
callstack_64Bit.jpg (20.95 KiB) Viewed 6204 times
The following image shows how the callstack is displayed in 32 bits.
callstack_32Bit.jpg (85.68 KiB) Viewed 6204 times


Re: Infinite loop in madStackTrace.CollectPossibleStackItems

Posted: Tue Jul 28, 2020 2:47 pm
by madshi
Not sure what to do here. I guess adding the "count > 1000" check shouldn't really harm. Just for fun, have you checked how high it goes if you don't limit the loop? I assume the loop is never left in that situation?

I'm really wondering why this problem happens. I've no idea. But I think the easiest way out would be to simply add that "count > 1000" check, or maybe increase it to "count > 10000", just to feel better about it?

Re: Infinite loop in madStackTrace.CollectPossibleStackItems

Posted: Tue Jul 28, 2020 3:55 pm
by santiago
Hello madschi,

thanks for looking into this.

First, your question. I let it run for over a minute. count was at 1109357. Eventually we will run out of memory because the tmpArr keeps growing and growing...

Yesterday evening I took a closer look at the code in CollectPossibleStackItems and now better understand what is going on.

The repeat/until loop will end if either of two conditions are met:
  1. rc > 1000
I guess rc stands for recursion count. In this case there is no recursion involved. So this is not relevant.
  • sf.AddrReturn.Offset = 0
The loop is exited when the stackframe offset is 0. This means it is the first frame.
In this case the stackframe offset never reaches 0 but it does reach 1. You can see this if you examine the log file I attached. For me the key to the problem is the offset value of 1. I guess there must be some special meaning to that, but I was not able to find anything.
Maybe you have an idea. An offset value of 1 means that the stackframe pointer is offset by only 1 byte. In practice this means that no other stackframe could be 'behind' it. So this must be the first stackframe.
As a test I changed the condition to:
(sf.AddrReturn.Offset <= 1)
This also solves the problem.
I guess one could assume that the first frame has been reached if offset < 8 (8 bytes size of 64 bit pointer).

Yes, I think it would be a good idea to have some sort of emergency exit strategy in place in case something like this should ever happen again.
count > 10000 seems reasonable to me.
Ideally you could also log whenever an emergency exit took place, as that could help us get to the cause of the problem.

Let me know what you think regarding the offset < 8 approach.

Could you please upload a new build that includes the emergency exit (count > 10000)?
if you plan on doing something regarding the offset (1 or 8 ) then it is OK for us to wait until all changes are completed before you trigger a new build.


Re: Infinite loop in madStackTrace.CollectPossibleStackItems

Posted: Wed Jul 29, 2020 5:15 pm
by madshi

Re: Infinite loop in madStackTrace.CollectPossibleStackItems

Posted: Wed Jul 29, 2020 10:48 pm
by santiago
Hello madschi,

thank you for providing the update so quickly!!

I reviewed the changes you did to madStackTrace.pas and they are just fine.

There is just one typo in the comment on line 1307. The last 0 should be a 1.

Code: Select all

// Offset should be 0, but there was one reported case of it being 0
I have this special workspace set up where I can reproduce the problem. In this workspace I am using the madExcept source files (*.pas), and am not making use of the *.dcp and *.bpl files.
I used the .pas files you provided with the update.
Recompiled and everything works fine now. Problem is gone :-)

However when I dynamically link to madExcept (using dcp + bpl). Which is what we actually use. The problem persists.
This I've done in another workspace.

I've double and triple checked that I did the update correctly on my end.
I even renamed my <ProgramFiles>/madCollection folder and reinstalled the madCollectionUpdate.exe.

Do you provide precompiled binaries with madCollectionUpdate.exe or are the binaries compiled during installation?
To me it seems that the binaries (bpl, dcp) do not include the changes made to madStackTrace.pas.
Could this be the problem?

Re: Infinite loop in madStackTrace.CollectPossibleStackItems

Posted: Wed Jul 29, 2020 11:12 pm
by madshi
All the files should be automatically recompiled by my automated installer creation script. Can you please check if the bpl/dcp files have the same (or ever so slightly newer) file date/time as the madStackTrace.pas file?

Re: Infinite loop in madStackTrace.CollectPossibleStackItems

Posted: Wed Jul 29, 2020 11:54 pm
by santiago

After I reread your message, it seems you do provide precompiled binaries.
If that is the case, then yes the binaries were created ca. 47 min after the madStackTrace.pas file was modified.
madExcept_binaries_time.jpg (59.75 KiB) Viewed 6156 times

Re: Infinite loop in madStackTrace.CollectPossibleStackItems

Posted: Thu Jul 30, 2020 7:22 am
by madshi
Please try this update: (installer

Does it work?

Re: Infinite loop in madStackTrace.CollectPossibleStackItems

Posted: Thu Jul 30, 2020 2:44 pm
by santiago
Hello madschi,

thank you for the updated installer.
When using the .bpl/.dcp the message is displayed 'stack will be calculated soon'.
It seems to be the same problem as before.

But if I use the madExcept source files the problem does not happen.

At the moment I have no idea what is going on.

It seems like if the *.bpl/*.dcp do not include the changes you did to madStackTrace.pas. But this is rather unlikely at this point.

I will compile the *.bpl/*.dcp locally and see if I can figure out what is happening.

Re: Infinite loop in madStackTrace.CollectPossibleStackItems

Posted: Thu Jul 30, 2020 3:19 pm
by madshi
Strange. I wonder if maybe in this specific situation there's some corruption going on, like a buffer overrun or similar? Just a wild thought, though...

Re: Infinite loop in madStackTrace.CollectPossibleStackItems

Posted: Thu Jul 30, 2020 5:34 pm
by santiago
Hello madschi,

I figured out what the problem was.

With some Windows 10 update debugging became really slow in Delphi when using runtime packages.
We deploy our application using runtime packages, but for us developers we put a system in place that can statically link third party components (like madExcept, DevExpress, etc) which speeds up debugging.

We rely on the version of the third party to component to create the packages for static linking.

The problem here is that the updated files we received are still using the same version number as the previous madExcept version 5.1.0.

This led to the 'old' dcu files being used which of course did not contain the fix.

After disabling static linking I verified that everything is working fine.

Ideally every published build of madExcept should have a unique version, eg. 5.1.1

I guess this would make also your life easier, as you would have certainty as to what changes a given customer actually has.

Please let me know if you could provide us with a madExceot build that has a new version number (eg.: 5.1.1).

Otherwise I will have to think about how to work around the problem on our end.

Thank you once again!!

Re: Infinite loop in madStackTrace.CollectPossibleStackItems

Posted: Thu Jul 30, 2020 5:37 pm
by madshi
I usually change version numbers only for official releases, not for "hotfixes". For hotfixes, I only change the version number of the installer.

Re: Infinite loop in madStackTrace.CollectPossibleStackItems

Posted: Thu Jul 30, 2020 8:20 pm
by santiago
THANK YOU for your quick help!

We added a feature to our internal tool chain that allows us to override the version number obtained from the dll (bpl) by including a textfile with the 'new' version number.

Now several third party components with the same version can happily coexist side by side.

Am soooo glad we got this problem sorted out. :-)