Threads getting "stuck"
Posted: Wed Nov 25, 2020 10:51 am
I've inherited a rather large web broker based web service. It uses the TIdHTTPWebBrokerBridge with a TWebModule handling both SOAP and RESTful like requests.
My first task has been to make it more reliable. My company was/is aware that it was having issues - to the degree that it's forcibly restarted every hour.
I've been knocking various issues off. One big issue I identified was that there was lots and lots of database locking and in the end I came to the conclusion that transactions were being left open. What's that got to do with madExcept?
After exhausting all of the obvious avenues and studying our internal logs I realised that there were threads (handling requests) that were never getting to end. The forced restart of the web service (above) released the locks. This added more evidence that threads weren't getting to end and so were holding locks.
Running the web service locally and purposefully causing a database issue (in this case a timeout) I've been able to recreate the scenario where the thread "hangs".
The web service is built using Delphi 10.1 Berlin (Update 1), FastDCC & madExcept 5.0, FireDAC is used for the database functionality.
I've attached a picture of where the debugging got me. To handle timeouts (as an example) FireDAC runs within a thread, so we've got a no. of levels of nested threads from a request. FireDAC raises an exception that bubbles up through the layers, this is then caught by madExcept as a "hidden" exception, it's handing of this then "hangs" and the standard application logic to handle the exception doesn't occur (hence no rollback/commit etc.)
As per the diagram. An AV is raised when trying to get the stack dump. This forces it into the local exception handler and it calls InternalError() at which point it falls into _DoneExcept() and "hangs" on TObject.Free().
Any ideas? I know it's not much to go on. I've tried to produce a "cut down" repro and although I have, the nature that this scenario doesn't occur all of the time means it's not a great repro Obviously in our LIVE environment the no. of requests coming in means that this issue occurs quite regulary.
So, I disabled madExcept. So far we haven't seen any sign of threads not getting to end or the database related issues.
My first task has been to make it more reliable. My company was/is aware that it was having issues - to the degree that it's forcibly restarted every hour.
I've been knocking various issues off. One big issue I identified was that there was lots and lots of database locking and in the end I came to the conclusion that transactions were being left open. What's that got to do with madExcept?
After exhausting all of the obvious avenues and studying our internal logs I realised that there were threads (handling requests) that were never getting to end. The forced restart of the web service (above) released the locks. This added more evidence that threads weren't getting to end and so were holding locks.
Running the web service locally and purposefully causing a database issue (in this case a timeout) I've been able to recreate the scenario where the thread "hangs".
The web service is built using Delphi 10.1 Berlin (Update 1), FastDCC & madExcept 5.0, FireDAC is used for the database functionality.
I've attached a picture of where the debugging got me. To handle timeouts (as an example) FireDAC runs within a thread, so we've got a no. of levels of nested threads from a request. FireDAC raises an exception that bubbles up through the layers, this is then caught by madExcept as a "hidden" exception, it's handing of this then "hangs" and the standard application logic to handle the exception doesn't occur (hence no rollback/commit etc.)
As per the diagram. An AV is raised when trying to get the stack dump. This forces it into the local exception handler and it calls InternalError() at which point it falls into _DoneExcept() and "hangs" on TObject.Free().
Any ideas? I know it's not much to go on. I've tried to produce a "cut down" repro and although I have, the nature that this scenario doesn't occur all of the time means it's not a great repro Obviously in our LIVE environment the no. of requests coming in means that this issue occurs quite regulary.
So, I disabled madExcept. So far we haven't seen any sign of threads not getting to end or the database related issues.