A better Windows kernel Rust bug

Last time was seemingly entirely click-bait, but it seems we finally have an actual reported security issue in Windows due to Rust, at least in that it's an instant BSOD instead of possibly RCE!

Release: Windows Rust-based Kernel GDI Vulnerability Leads to Crash and Blue Screen of Death Error

Details: Denial of Fuzzing: Rust in the Windows kernel - Check Point Research

Didn't see a CVE anywhere? Not sure what's up with that.

Notably, they have the following quote in their conclusion:

A fitting analogy might be a home alarm system that stops a burglar by blowing up the house.

referring to their belief that Rust out of bounds errors shouldn't bring down the system. While understandable, it's not clear what they would prefer: the only alternatives I can think in general are:

  • never writing a bug :face_savoring_food:
  • never using a panicking API
  • unwinding the kernel thread

Though for this specific case it seems a bit weird that it's running geometry parsing code in the kernel: perhaps in it's original form GDI needed direct access to the display buffer or something and it's a pain to lift up now.

Panic-free enforcement has been talked about quite a few times, but my understanding is it's pretty tricky to add now? It's generally not what you'd want, but it seems like it might be worth it for kernel use.

I don't think anyone unwinds in the kernel, either? Strictly it should be possible from what I understand, but it probably makes some people's skin crawl.

I guess it's possible that this is just a security research team getting salty that they are going to get less juicy RCE bugs and just boring BSODs instead :grinning_face_with_smiling_eyes:

Also the analysis hints at some interesting details of how Rust is being used; I don't think someone would from scratch implement even a single linked list for this in Rust, but equally standard Rust bounds checks caught the issue. I'm curious how this port was done; probably at least some "copilot please rewrite this C++ in Rust" but that doesn't scale....

I also didn't see any mention of if the bug reproduced in the original C++ code, it would be a pretty clear win if the port simply preserved the original bug!

2 Likes

This was done to improve performance in Windows NT 4.0, nearly 30 years ago, moving window management and graphics from IPC to csrss.exe into a kernel driver, win32k.sys. It’s been a rich source of security issues like local privilege escalation for years. I’m not sure if it’s fair to say security researchers are salty about some of this finally being addressed, but they must be feeling a bit miffed.

1 Like

Yeah, l did see that at some point, but I guess I'm just surprised even all this vector stuff that presumably got bolted on well after was also done in kernel space: plenty of Windows APIs are actually largely implemented in user space and it's not like it's a performance or security issue to have a user space writable pixel buffer to copy from.

SelectClipPath goes all the way back to Windows 95, so I assume it was one of the APIs that was initially moved into win32k.sys. If any of the C++ code goes that far back then it’s not surprising that it wasn’t written with security in mind.

1 Like

Huh, so I guess all the EMF stuff is not actually relevant to reproducing the issue? I'll need to dig into it a bit further to see where all this stuff actually lives.

Ugh... when can they stop writing like that.

I'm pretty sure that certain parties in this world would praise such a system when considering secret clearance documents. It's not outlandish to consider if this may be even physically implemented policy. And somehow on top the comparison fails both sides:

  • You're not being burgled, advanced persistent threats rent themselves as permanent tenants into your home; and maybe write themselves into the deed for it. And then rent your house out while still having you pay the property tax.
  • The house isn't blown up, it's evacuated. Or have they seen a kernel panic lead to a halt-and-catch-fire? That and permanent data corruption is more likely to happen when the system continues running in a corrupt state. Maybe that's database background coming through but a computer that stopped doing IO is a planned for scenario, the other is not.

Security relevant, real-time, liveness criteria are quite uncommon. Should the system halt on a bug? No, but reboot is a decent recovery strategy that is compatible with panicking. Automatically reinitializing into a good state is basic practice in the cloud age anyways; if you've even remotely touched containerizing your ops. If anything we should be decrying that most software components smaller than containers rarely have such good state boundaries that would allow rebooting only that certain part of the system which got its RAM data structures into a non-progressable state.

4 Likes

I'd hope, but at the same time, I've seen several issues on open-source Microsoft repositories where the maintainers seemed to just start having copilot generate PRs. Which is their choice, of course, but I choose to not be happy about it anyway.

Edit for clarity: "I'd hope" as in "I'd hope it doesn't scale".

1 Like

I’m guessing that loading EMF+ files is a very good way to do fuzz testing of GDI and GDI+ API calls.

From the article:

This flaw, which could trigger a system-wide crash via a Blue Screen of Death (BSOD), highlights the challenges of integrating memory-safe languages into critical OS components.

That is totally the wrong conclusion.

What they are saying is that the "challenge" is that Rust immediately highlighted a critical security flaw (out of bounds array access in a kernel level process) rather than ignoring it. Which would likely show up as a bug/security problem elsewhere later that is incredibly hard to find the cause of.

The "challenge" then is that Rust found a problem somebody has to do the work to fix. Really?

No, the challenge is the same as ever, achieving bug free code in complex systems. Which as this story demonstrates Rust is a great help with.

6 Likes

Yeah, if you want to say that it's bad, you have to explain what the better alternative is: near as I can tell for them it's my first option "don't write bugs":

rigorous security testing and thoughtful software design remain essential, as issues can still arise.

this example must also serve as a reminder of the difficulties involved and the necessity of using extremely thorough engineering standards and principles. Even rigorous standards will not guarantee smooth sailing. We should still anticipate encountering unexpected bugs and vulnerabilities.

(That whole conclusion is extremely repetitive, and would be extremely AI-feeling if not for the exploding house metaphor.)

Indeed. And this is exactly what happened here. Rust anticipates that out of bounds array access errors will happen and responded accordingly when it happened.

We could debate what the best response to such an error is but that is another story. Which depends on the situation and desired outcome.

I noticed an argument made repeatedly all over the place that basically amounts to "See, the memory safety guarantees of Rust are not worth it because all kind of the bugs can bite you." Of which this article is an example.

their belief that Rust out of bounds errors shouldn't bring down the system

I have seen many cases where people got surprised that Rust program's unexpected behaviors. I think we should add more explanation of these two very important chapters Behavior considered undefined and Behavior not considered unsafe in The Rust Reference, mapping those low-level details to high-level vulnerability descriptions that normal people can understand.

For example:

(all below are discussed in safe Rust context)

  • Rust program may abort and unwind
  • Rust program may contain RCE vulnerabilities caused by logic error
    • For example, command injection if the developer simply concat commands and throw it to /bin/sh to execute
    • For example, developers use arena to simulate a "heap", and use index to simulate the address. Then some wrong logic may lead to situations like heap overflow, double free or use-after-free.
  • Rust program can never contain RCE vulnerabilities caused by control-flow-hijack
    • For example, stack overflow to overwrite return address, or heap overflow to overwrite some function pointers.

That is quite an assumption. I would wager that "normal people" have very little understanding of what causes security vulnerabilities. Historically programmers never thought about it much. Although general awareness of such things among programmers has increased in recent years.

It's not clear to that a language reference should get into a discussion of security vulnerabilities. The scope of mistakes programmers can make and the ways those mistakes can be exploited is vast. New exploits are devised all the time.

My main worry though is that it confuses the type and memory safety that Rust offers with security. These are two different things. There are an infinite number of ways to write an insecure program despite Rust's memory assurances. We already have enough people shouting that "bah, there's is no point to use Rust because it does not make code secure".

I'm wondering what unexpected behaviours you think people are surprised by? My experience so far is that it has a lot less than say C++ or Javascript. If I find something Rust does "unexpected" that is always (so far) down to my ill informed expectations not Rust or its docs.

I think that this still applies - i.e. Rust memory safety covers 70% of memory issues and no surprise when something in the other 30% is found:

The article that OP links to also states:

... highlights the challenges of integrating memory-safe languages...

I see no evidence that the issue is integration-related. Article jumping to conclusion? Lacking such evidence, it could still very well be related to memory bounds access issues which again don't relate to alloc/free (Rust's specialty) but rather those code sections (isolated, if proper code strategy) where memory must be accessed. Graphics buffers are memory.

I was trying to help a friend convince their firmware department to switch to Rust from their custom C++ toolchain. The trick there was that they'd heavily invested in automations (via custom allocator logic and magic) so that unknowing developers couldn't invoke what happened in the Windows kernel. If there was the potential for an OOB error it would be caught and it would fail at compile time (as best they could).

The main example he provided to me was something like this, in that the first access method will return an error in result1 but will crash out in result2. Yes, developers worth their salt should know to do bounds checking but there isn't a way to force it in Rust (that I'm aware of today, please correct me!)

fn main() {
    let buffer = vec![0; 256];

    let _result1 = buffer.get(257);
    let _result2 = buffer[257]; // ← Crash here
    println!("Am I alive?")
}

Clippy doesn't seem to complain about it either:

MyMachine:blah me$ cargo clippy
    Checking blah v0.1.0 (/private/tmp/blah)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.03s

Update: On mulling this over, can a clippy rule exist that would trip if a direct array access was done without a bounds check in the current function? Would that have helped Microsoft not footgun themselves?

There are clippy lints for that, e.g.

1 Like

Neat, though it's kind of one down, several hundred to go I suppose, even if it is a pretty big one!

I'm starting to think that direct array indexing should have to be wrapped in unsafe. It seems to be the same kind of error prone as using a raw pointer after all.

fn main() {
    let buffer = vec![0; 256];

unsafe {
    let _result1 = buffer.get(257);
    let _result2 = buffer[257]; // ← Crash here
}
    println!("Am I alive?")
}

I guess it is not because it will panic rather than do some mysterious UB. And the ergonomics of that might not go down well with people.

Idea:

How about if wraps unsafe direct array indexing in unsafe that tells the compiler not to do any bounds checking?

After all unsafe is a message to the programmer that they have to check things.

2 Likes

The problem as I alluded to is that there's a huge number of APIs that can panic in theory; you'd probably be over half wrapped in "may panic" blocks (and what about the people that call those functions? they could get a panic too!).

Getting panic free does have some similarities to the safety requirements of unsafe, but I expect an ergonomic solution would have to be a capability system at best, some sort of dependant type nightmare at worst; where you're carrying around "proofs" that you're meeting the safety/non-panic requirements of later calls.

1 Like

By the way, multiple people now have talked about "array access/indexing" and gave examples like

    let _result2 = buffer[257]; // ← Crash here

when they were really talking about Vec<_>s, not arrays.

Rust arrays are primitive types with a compile-time known length and built-in usize indexing. If you try to index a Rust array with a literal usize that would panic, a built-in deny-by-default lint fires.

fn main() {
    let buffer = [0; 256];
    let _ = buffer[257];
}
error: this operation will panic at runtime
 --> src/main.rs:3:13
  |
3 |     let _ = buffer[257];
  |             ^^^^^^^^^^^ index out of bounds: the length is 256 but the index is 257
  |
  = note: `#[deny(unconditional_panic)]` on by default

(The lint is relatively limited and all the points stand for dynamic values anyway, but the inaccurate phrasing got to me anyway.)

7 Likes