Page 1 of 1

MCH 3x: Win10 Parallel DLL loading issue

PostPosted: Wed Apr 18, 2018 6:54 am
by EaSy
one customer has issue with MCH and parallel dll loading in their internal "K2.exe" app.

They start K2. It takes about 10s to start it than it crash it in shcore.dll. (I can send you dump with symbols if you want ~100MB).

Code: Select all
   0  Id: 1d04.394 Suspend: 1 Teb: 0035e000 Unfrozen
 # ChildEBP RetAddr  Args to Child             
00 0014fa28 76ffd8ba 76fba213 00000004 00000000 ntdll!KiFastSystemCallRet
01 0014fa2c 76fba213 00000004 00000000 00000000 ntdll!NtWaitForSingleObject+0xa
02 0014fa54 7703853d d002f9ef 00000000 00000000 ntdll!LdrpDrainWorkQueue+0x12c
03 0014fca8 76fec24e d002f847 00000000 00000000 ntdll!LdrpInitializeProcess+0x1b3f
04 0014fd00 76fec10c 00000000 bb541b5e 00000000 ntdll!_LdrpInitialize+0xec
05 0014fd10 00000000 0014fd24 76f70000 00000000 ntdll!LdrInitializeThunk+0x1c

#  1  Id: 1d04.262c Suspend: 1 Teb: 0035f000 Unfrozen
 # ChildEBP RetAddr  Args to Child             
> 00 017df80c 76788079 767629c5 76dc5f73 00000001 SHCore!Microsoft::WRL::Module<1,Microsoft::WRL::Details::DefaultModule<1> >::Create+0x12 <---- CRASH TEB TLS IS NULL
01 017df810 767629c5 76dc5f73 00000001 767c96b0 SHCore!Microsoft::WRL::Module<1,Microsoft::WRL::Details::DefaultModule<1> >::StaticInitialize+0x5
02 017df814 76dc5f73 00000001 767c96b0 00000002 SHCore!`dynamic initializer for 'Microsoft::WRL::Module<1,Microsoft::WRL::Details::DefaultModule<1> >::isInitialized''+0x5
03 017df830 76797438 767576d8 76757718 00000000 msvcrt!_initterm+0x43
04 017df860 7679759a 76750000 00000001 0014fd24 SHCore!_CRT_INIT+0x1a6
05 017df8c4 76ffd746 76750000 00000001 0014fd24 SHCore!__DllMainCRTStartup+0xc4
06 017df8e4 76fbf79a 767974a0 76750000 00000001 ntdll!LdrxCallInitRoutine+0x16
07 017df930 76fc418d 00000001 0014fd24 d16bfcf7 ntdll!LdrpCallInitRoutine+0x55
08 017df9b0 76fc42a2 015ecf10 015ed458 015f0930 ntdll!LdrpInitializeNode+0x10e
09 017df9d4 76fc42c1 017df9f3 016d9dd8 017f58e8 ntdll!LdrpInitializeGraphRecurse+0x5d
0a 017df9fc 76fc42c1 017dfa1b 0161c5d0 00000000 ntdll!LdrpInitializeGraphRecurse+0x7c
0b 017dfa24 76fc47db 017dfa3f 017dfac0 017dfabc ntdll!LdrpInitializeGraphRecurse+0x7c
0c 017dfa40 76fbdd71 d16bffcf 017dfc00 0015001d ntdll!LdrpPrepareModuleForExecution+0x8b
0d 017dfa88 76fc1dfb 00000600 00000004 00000000 ntdll!LdrpLoadDllInternal+0x128
0e 017dfbd4 76fc289e 00000000 00000001 017dfbf8 ntdll!LdrpLoadDll+0x93
0f 017dfc5c 0015040a 00000000 00000000 0015001d ntdll!LdrLoadDll+0x7e
WARNING: Frame IP not in any known module. Following frames may be wrong.
> 10 017dfca8 76fec1e4 d16bf847 00000000 00000000 0x15040a <---- MCH INJECTING ROUTINE
11 017dfd00 76fec10c 00000000 d16bf857 00000000 ntdll!_LdrpInitialize+0x82
12 017dfd10 00000000 017dfd24 76f70000 00000000 ntdll!LdrInitializeThunk+0x1c

   2  Id: 1d04.1634 Suspend: 1 Teb: 00360000 Unfrozen
 # ChildEBP RetAddr  Args to Child             
00 019dfc9c 770000aa 76fec219 00000000 019dfcc0 ntdll!KiFastSystemCallRet
01 019dfca0 76fec219 00000000 019dfcc0 d18bf847 ntdll!NtDelayExecution+0xa
02 019dfd00 76fec10c 00000000 d18bf857 00000000 ntdll!_LdrpInitialize+0xb7
03 019dfd10 00000000 019dfd24 76f70000 00000000 ntdll!LdrInitializeThunk+0x1c

The crash is caused by shcore.dll that is accessing TLS during CRT global variable init, but TLS is NULL in TEB at the moment (fs:[0000002Ch] is NULL). This would be MS issue, but it is more a collision between MCH and parallel dll loading in W10, because our DLL is injected into the process by secondary thread instead of main thread which has TLS initialized in TEB. That 10s delay is also not random, it is the timeout value in your dll loading mechanism that prevents this kind of situation. This means that it is not working as intended, because main thread is probably waiting LdrpDrainWorkQueue for the second thread to finish dll loading so the 10s timeout is hit.

You could somehow change the behaviour not to inject in secondary threads (especially in those with TLS NOT initialized...).
According to alternative docs. There is "Detour detection mechanism" ( and which detects whether there are some methods hooked ("NtOpenFile", "NtCreateSection", "NtQueryAttributesFile", "NtOpenSection" and "NtMapViewOfSection") or not. I think you could use this to disable parallel dll loading when MCH is injecting apps.
Something else :)



Re: MCH 3x: Win10 Parallel DLL loading issue

PostPosted: Mon Apr 23, 2018 6:24 pm
by madshi
The latest 3.x and 4.x drivers should already contain extra code to make sure DLL injection only happens inside of the main thread. Is your madCodeHook build up-to-date?

Re: MCH 3x: Win10 Parallel DLL loading issue

PostPosted: Thu May 03, 2018 7:21 am
by EaSy
you mean this part of the code I guess:
Code: Select all
//   if (mtid <> 0) and (mtid <> ctid) then begin
//     // This is not the main thread! This usually doesn't happen, except sometimes in win10.
//     // We "solve" this by waiting until the main thread has completed executing our loader stub.
//     // Max wait time 1 second, just to be safe.
//     for c1 := 1 to 100 do begin
//       if (buf.pOldApi^.jmp = and (buf.pOldApi^.target = then
//         // Our loader stub patch was removed, so we assume that the main thread has completed running it.
//         break;
//       sleep := -100000;// 10 milliseconds
//       buf.nde(nil, sleep);
//     end;
//   end;

then we have up to date version of your code.

I am not 100% sure, because all I have is the crash dump right now, but the issue is that the main thread is waiting for the secondary thread to finish (LdrpDrainWorkQueue) so the NtTestAlert is NOT called from the main thread until the secondary thread is finished... then secondary thread hits the timeout and starts the injection. Is this scenario possible?


Re: MCH 3x: Win10 Parallel DLL loading issue

PostPosted: Thu May 03, 2018 7:29 am
by madshi
Yes, that sounds possible. I've already modified my patch to simply not inject at all, in case the delay loop times out. Of course that's not a nice solution, either, but probably better than producing instability.

I've 2 more changes in preperation to fully fix the issue:

1) Thanks to your stackoverflow link, I'm considering forcing the OS to disable parallel DLL loading for processes into which get a hook DLL injected. Still, not a perfect solution, but probably the best we can do, using the current injection approach.

2) I'm planning to add a new DLL injection method soon, based on patching the IAT table of the newly created process in such a way that your hook dll appears to be statically linked by the EXE. This way the OS loader should take care of loading the DLL for us.


Re: MCH 3x: Win10 Parallel DLL loading issue

PostPosted: Thu May 03, 2018 7:51 am
by EaSy
thx for fixing it, can you make beta build to test it? Thx.

1) MS evidently invested some time to check api hooking to disable paralel loading in case of their Detour dll injection, so it is the good idea to follow their path.

2) Sounds nice, but we will wait until it is done and bug free since the current injection method works fine right now.

Keep up the good work.


Re: MCH 3x: Win10 Parallel DLL loading issue

PostPosted: Thu May 03, 2018 7:55 am
by madshi
Will let you know when I have the "Detours simulation" implemented. For now, the other workaround is already implemented in this beta build: (installer

Re: MCH 3x: Win10 Parallel DLL loading issue

PostPosted: Wed May 16, 2018 9:02 am
by madshi
Could you please check if the issue is fixed in this build? (installer

This build forcefully disables parallel loading for all processes into which we inject a DLL. I'm doing this by directly modifying the PEB structure of the newly created process. There's a "loaderThreads" field in the PEB which when set to 1 should disable parallel loading. For WOW64 processes I'm modifying both the PEB64 and PEB32 structures, just to be safe.