Intel's CET Shadow Stack issue

c++ / delphi package - dll injection and api hooking
Bevan Collins
Posts: 42
Joined: Fri Jul 07, 2006 2:50 am

Re: Intel's CET Shadow Stack issue

Post by Bevan Collins »

FYI: I just tried using detours in the test app instead and it worked without crashing
iconic
Site Admin
Posts: 1065
Joined: Wed Jun 08, 2005 5:08 am

Re: Intel's CET Shadow Stack issue

Post by iconic »

I'm very confident that the latest build of the collection that Madshi linked you to will solve your issue. If other users were still having CET issues since March they'd be blasting him with emails this past 6 months, since the hotfix was released and the hook code stubs updated. I would definitely recommend that you upgrade your subscription and try the updated code. Please let us know once you do and kindly share your results with us.

--Iconic
madshi
Site Admin
Posts: 10753
Joined: Sun Mar 21, 2004 5:25 pm

Re: Intel's CET Shadow Stack issue

Post by madshi »

If the issue still occurs with the latest build, then please check if using the NO_SAFE_UNHOOKING "fixes" it.

There's a special logic in madCodeHook which counts how many threads are currently "inside" of your hook callback function. This logic can sort of overflow when there are a very high number of threads calling it at the same time. In that situation there's a special code branch which handles this overflow situation. I've tested it and it works fine for me. However, I don't currently have a CET compatible CPU to test with. So there is a small chance that there could still be a CET problem in that code branch (although I did verify and rewrite it).

Iconic is correct in saying that if the latest madCodeHook builds were not CET compatible, I would have received a lot of complaints. However, this "overflow" mentioned above doesn't usually occur. So it could be something my other customers simply haven't run into (yet?).

Using the NO_SAFE_UNHOOKING flag disables all of this extra checking and the extra code branches, so it's a good way to test if that might be where the problem might be coming from.
Bevan Collins
Posts: 42
Joined: Fri Jul 07, 2006 2:50 am

Re: Intel's CET Shadow Stack issue

Post by Bevan Collins »

I'm sorry to report that this is still a problem for MCH 4.20.0 and can easily be replicated by repeatedly calling a hooked API from a CET enabled process until the shadow stack overflows. There is in-depth analysis in the chromium bug report that should help find a solution.

Thanks
Bevan Collins
Posts: 42
Joined: Fri Jul 07, 2006 2:50 am

Re: Intel's CET Shadow Stack issue

Post by Bevan Collins »

I just tested with using NO_SAFE_UNHOOKING and it no longer crashes. Thanks madshi, I will use this work around. Let me know if you want me to do any tests.
madshi
Site Admin
Posts: 10753
Joined: Sun Mar 21, 2004 5:25 pm

Re: Intel's CET Shadow Stack issue

Post by madshi »

Using NO_SAFE_UNHOOKING should be a perfect solution as long as you don't need to uninject your hook dll. However, if you do want to uninject your hook dll at some point, then using NO_SAFE_UNHOOKING is bad.

I'll have a look at this, hopefully some time next week.
iconic
Site Admin
Posts: 1065
Joined: Wed Jun 08, 2005 5:08 am

Re: Intel's CET Shadow Stack issue

Post by iconic »

NO_SAFE_UNHOOKING makes a lot of sense when you're dealing with an API that may have a high call volume such as PeekMessageW()

Actually, Madshi recently mentioned this to someone else which solved their separate issue below:

viewtopic.php?f=7&t=28915&p=54028&hilit ... ing#p54028

I think we can safely chalk this up to a non-CET issue now that you have reported back your results. As a reminder you should leave your DLL module(s) loaded when using the NO_SAFE_UNOOKING flag as Madshi has said.


--Iconic
Bevan Collins
Posts: 42
Joined: Fri Jul 07, 2006 2:50 am

Re: Intel's CET Shadow Stack issue

Post by Bevan Collins »

Thanks iconic. It's definitely a CET issue as it can only be replicated on a CET enabled process. See https://bugs.chromium.org/p/chromium/is ... 245815#c15:
First conclusion - the shadow stack has overflowed but the real stack has not.
Likely culprit - epclient64 doing some hooking, adjusting the real stack,
but failing to adjust the shadow stack, so, eventually, boom!
I'm thankful that using NO_SAFE_UNHOOKING avoids this.
iconic
Site Admin
Posts: 1065
Joined: Wed Jun 08, 2005 5:08 am

Re: Intel's CET Shadow Stack issue

Post by iconic »

Hmmm, so any process outside of the likes of MS Edge and Chrome you can also reproduce once CET is enabled with the same injected module?

—Iconic
Bevan Collins
Posts: 42
Joined: Fri Jul 07, 2006 2:50 am

Re: Intel's CET Shadow Stack issue

Post by Bevan Collins »

I believe so, the test app https://www.dropbox.com/s/34fnbg8vbx9o2 ... 3.zip?dl=0 is built with /CETCOMPAT
iconic
Site Admin
Posts: 1065
Joined: Wed Jun 08, 2005 5:08 am

Re: Intel's CET Shadow Stack issue

Post by iconic »

Unfortunately, like Madshi, I also do not possess the 9th Gen Intel CPUs that have CET technology built-in to the underlying hardware so testing on my end isn't physically possible. As Madshi had said in a recent post, safe hooking uses different code branches where no safe unhooking is simple and straight forward with a JMP instruction to the hook callback. So, there may still be something going on there within those code paths designed for internally keeping track of how many threads are inside your hook callback. It's just very strange because we haven't run into CET issues since he released the hotpatch back in March but further investigation will definitely be needed. For now I'm glad that the NO_SAFE_UNHOOKING workaround exists for you, it helps rule out some things.

P.S: Your code looks fine to me, I just loaded it into VS 2019.


--Iconic
madshi
Site Admin
Posts: 10753
Joined: Sun Mar 21, 2004 5:25 pm

Re: Intel's CET Shadow Stack issue

Post by madshi »

Hmmmmm... Looking at your test project, it seems to simply call PeekMessage() in a loop, with no threads being involved (unless I'm missing something). I can't really see how this could possibly result in the problem I was thinking about. So this must be a completely different issue than I thought. What I was thinking about was an insane amount of threads calling PeekMessage() simultaneously, which doesn't seem to be the case here at all. Also, the crash dump talks about a "simple" stack overflow, not about a CET shadow stack position mismatch, which is a completely different thing.

I'm wondering if it might be something simple we haven't thought of yet. E.g. "C/C++ -> Code Generation -> Security Check" is activated in your test project, which adds some stack checking stuff. I don't suppose disabling this fixes the issue? Or can you think of any other reason why the (shadow) stack might actually run out of space? Maybe for some reason there's really an extraordinary amount of stack space being consumed somewhere, and the CET shadow stack runs out of space earlier than the normal stack (for whatever reason)?

Also, I assume the issue occurs in both debug vs release builds, and in 32bit as well as in 64bit?

One thing worth noting is that nobody else seems to have CET problems with the latest madCodeHook build, or at least not that I'm aware of. So if this is so easily reproduced with a simple PeekMessage() loop, why has nobody else run into this problem yet?
Bevan Collins
Posts: 42
Joined: Fri Jul 07, 2006 2:50 am

Re: Intel's CET Shadow Stack issue

Post by Bevan Collins »

yes, the test project just continuously calls the hooked API. I have also tested hooking APIs other than PeekMessageW with the same result.
Also, I assume the issue occurs in both debug vs release builds, and in 32bit as well as in 64bit?
I think it was still occurring in debug builds. I don't have 32bit hardware for testing.
why has nobody else run into this problem yet?
maybe the hooked API isn't called enough times to overflow the shadow stack
madshi
Site Admin
Posts: 10753
Joined: Sun Mar 21, 2004 5:25 pm

Re: Intel's CET Shadow Stack issue

Post by madshi »

I've verified in the Visual Studio debugger that the normal stack usage is not increased at all if you call the hooked API. Sure, it's temporarily increased while the hook callback function is in use. But once the PeekMessage call returns, stack use is back to normal. So I wasn't able to see any stack space wasting. I don't know how to check the shadow stack, but if it doesn't match the normal stack, there should be an exception right away, so I think we can assume that shadow stack and normal stack do match at all times. The exception in your crash dump is not about the shadow stack mismatching, it's about the shadow stack simply running out of space. To be honest, I've zero ideas how madCodeHook could be responsible for an issue like that.

Right now, I'm not sure what else I could do, except maybe getting a CET compatible PC to test with. I do plan to update my development PC soon, I'm mostly just waiting for Zen 3 Threadripper CPUs to become available, which should be CET compatible.

In any case, no other user is currently reporting any CET issues. So if you're fine with the NO_SAFE_UNHOOKING option, then I guess I'll just wait until my new development PC is ready.
iconic
Site Admin
Posts: 1065
Joined: Wed Jun 08, 2005 5:08 am

Re: Intel's CET Shadow Stack issue

Post by iconic »

Bevan,

Can you please let us know if this happens with any process or is this just specific to a process such as Chrome or Edge?

--Iconic
Post Reply