MCH 3x: Win10 Parallel DLL loading issue
Posted: Wed Apr 18, 2018 6:54 am
Hi,
one customer has issue with MCH and parallel dll loading in their internal "K2.exe" app.
Symptoms:
They start K2. It takes about 10s to start it than it crash it in shcore.dll. (I can send you dump with symbols if you want ~100MB).
Issue:
The crash is caused by shcore.dll that is accessing TLS during CRT global variable init, but TLS is NULL in TEB at the moment (fs:[0000002Ch] is NULL). This would be MS issue, but it is more a collision between MCH and parallel dll loading in W10, because our DLL is injected into the process by secondary thread instead of main thread which has TLS initialized in TEB. That 10s delay is also not random, it is the timeout value in your dll loading mechanism that prevents this kind of situation. This means that it is not working as intended, because main thread is probably waiting LdrpDrainWorkQueue for the second thread to finish dll loading so the 10s timeout is hit.
Fix:
You could somehow change the behaviour not to inject in secondary threads (especially in those with TLS NOT initialized...).
-or-
According to alternative docs. There is "Detour detection mechanism" (https://stackoverflow.com/questions/427 ... lication-s and https://threatmatrix.cylance.com/en_us/ ... kdown.html) which detects whether there are some methods hooked ("NtOpenFile", "NtCreateSection", "NtQueryAttributesFile", "NtOpenSection" and "NtMapViewOfSection") or not. I think you could use this to disable parallel dll loading when MCH is injecting apps.
-or-
Something else
Thx.
EaSy
one customer has issue with MCH and parallel dll loading in their internal "K2.exe" app.
Symptoms:
They start K2. It takes about 10s to start it than it crash it in shcore.dll. (I can send you dump with symbols if you want ~100MB).
Code: Select all
0 Id: 1d04.394 Suspend: 1 Teb: 0035e000 Unfrozen
# ChildEBP RetAddr Args to Child
00 0014fa28 76ffd8ba 76fba213 00000004 00000000 ntdll!KiFastSystemCallRet
01 0014fa2c 76fba213 00000004 00000000 00000000 ntdll!NtWaitForSingleObject+0xa
02 0014fa54 7703853d d002f9ef 00000000 00000000 ntdll!LdrpDrainWorkQueue+0x12c
03 0014fca8 76fec24e d002f847 00000000 00000000 ntdll!LdrpInitializeProcess+0x1b3f
04 0014fd00 76fec10c 00000000 bb541b5e 00000000 ntdll!_LdrpInitialize+0xec
05 0014fd10 00000000 0014fd24 76f70000 00000000 ntdll!LdrInitializeThunk+0x1c
# 1 Id: 1d04.262c Suspend: 1 Teb: 0035f000 Unfrozen
# ChildEBP RetAddr Args to Child
> 00 017df80c 76788079 767629c5 76dc5f73 00000001 SHCore!Microsoft::WRL::Module<1,Microsoft::WRL::Details::DefaultModule<1> >::Create+0x12 <---- CRASH TEB TLS IS NULL
01 017df810 767629c5 76dc5f73 00000001 767c96b0 SHCore!Microsoft::WRL::Module<1,Microsoft::WRL::Details::DefaultModule<1> >::StaticInitialize+0x5
02 017df814 76dc5f73 00000001 767c96b0 00000002 SHCore!`dynamic initializer for 'Microsoft::WRL::Module<1,Microsoft::WRL::Details::DefaultModule<1> >::isInitialized''+0x5
03 017df830 76797438 767576d8 76757718 00000000 msvcrt!_initterm+0x43
04 017df860 7679759a 76750000 00000001 0014fd24 SHCore!_CRT_INIT+0x1a6
05 017df8c4 76ffd746 76750000 00000001 0014fd24 SHCore!__DllMainCRTStartup+0xc4
06 017df8e4 76fbf79a 767974a0 76750000 00000001 ntdll!LdrxCallInitRoutine+0x16
07 017df930 76fc418d 00000001 0014fd24 d16bfcf7 ntdll!LdrpCallInitRoutine+0x55
08 017df9b0 76fc42a2 015ecf10 015ed458 015f0930 ntdll!LdrpInitializeNode+0x10e
09 017df9d4 76fc42c1 017df9f3 016d9dd8 017f58e8 ntdll!LdrpInitializeGraphRecurse+0x5d
0a 017df9fc 76fc42c1 017dfa1b 0161c5d0 00000000 ntdll!LdrpInitializeGraphRecurse+0x7c
0b 017dfa24 76fc47db 017dfa3f 017dfac0 017dfabc ntdll!LdrpInitializeGraphRecurse+0x7c
0c 017dfa40 76fbdd71 d16bffcf 017dfc00 0015001d ntdll!LdrpPrepareModuleForExecution+0x8b
0d 017dfa88 76fc1dfb 00000600 00000004 00000000 ntdll!LdrpLoadDllInternal+0x128
0e 017dfbd4 76fc289e 00000000 00000001 017dfbf8 ntdll!LdrpLoadDll+0x93
0f 017dfc5c 0015040a 00000000 00000000 0015001d ntdll!LdrLoadDll+0x7e
WARNING: Frame IP not in any known module. Following frames may be wrong.
> 10 017dfca8 76fec1e4 d16bf847 00000000 00000000 0x15040a <---- MCH INJECTING ROUTINE
11 017dfd00 76fec10c 00000000 d16bf857 00000000 ntdll!_LdrpInitialize+0x82
12 017dfd10 00000000 017dfd24 76f70000 00000000 ntdll!LdrInitializeThunk+0x1c
2 Id: 1d04.1634 Suspend: 1 Teb: 00360000 Unfrozen
# ChildEBP RetAddr Args to Child
00 019dfc9c 770000aa 76fec219 00000000 019dfcc0 ntdll!KiFastSystemCallRet
01 019dfca0 76fec219 00000000 019dfcc0 d18bf847 ntdll!NtDelayExecution+0xa
02 019dfd00 76fec10c 00000000 d18bf857 00000000 ntdll!_LdrpInitialize+0xb7
03 019dfd10 00000000 019dfd24 76f70000 00000000 ntdll!LdrInitializeThunk+0x1c
The crash is caused by shcore.dll that is accessing TLS during CRT global variable init, but TLS is NULL in TEB at the moment (fs:[0000002Ch] is NULL). This would be MS issue, but it is more a collision between MCH and parallel dll loading in W10, because our DLL is injected into the process by secondary thread instead of main thread which has TLS initialized in TEB. That 10s delay is also not random, it is the timeout value in your dll loading mechanism that prevents this kind of situation. This means that it is not working as intended, because main thread is probably waiting LdrpDrainWorkQueue for the second thread to finish dll loading so the 10s timeout is hit.
Fix:
You could somehow change the behaviour not to inject in secondary threads (especially in those with TLS NOT initialized...).
-or-
According to alternative docs. There is "Detour detection mechanism" (https://stackoverflow.com/questions/427 ... lication-s and https://threatmatrix.cylance.com/en_us/ ... kdown.html) which detects whether there are some methods hooked ("NtOpenFile", "NtCreateSection", "NtQueryAttributesFile", "NtOpenSection" and "NtMapViewOfSection") or not. I think you could use this to disable parallel dll loading when MCH is injecting apps.
-or-
Something else
Thx.
EaSy