Negative views on Rust: language-based operating systems

Strangely enough Rust also shows that one does not need the hardware memory protection support of modern processors to get process memory isolation. Along with the potential saving in code complexity in kernels and performance overhead:GitHub - theseus-os/Theseus: Theseus is a modern OS written from scratch in Rust that explores 𝐢𝐧𝐭𝐫𝐚𝐥𝐢𝐧𝐠𝐮𝐚𝐥 𝐝𝐞𝐬𝐢𝐠𝐧, novel OS structure, and state management. It strives to close the semantic gap between compiler and hardware in order to maximally leverage the power of language safety, and thus shift OS responsibilities (resource management) into the compiler.

1 Like

That one is not proven. Yet. There were similar designs around Java when it was was new and exciting. And Sun even tried to push the idea that JVM can offer enough isolation that you don't need these silly CPU hardware to do that for you in browser. Imagine that!

They struggled for years and, finally, given up eight years ago (by turning Java applets into weird version of ActiveX).

Thus… no, I don't think it'll work. The same reason Java failed Rust would fail if someone would attempt it.

The issue here lies with inherent conflict of interests: if you use ownership/borrow checker just to write the robust code then it's natural to loosen the screws in controlled fashion over the time (Rust have changed the rules and added more special cases to the rules over the time), but every time you do this it brings the risk of introducing changes which make code unsound — yet every single case where you make something just a tiny bit unsound it does, potentially, allow attackers to get the foot in and break the security barrier!

I'm not sure if, after many attempts to commercialize JVM/CLR as OS runtime failed, someone would try to do that with Rust… but I, for one, wouldn't expect these attempts to provide something usable.

Well… maybe for specialized OSes where all the software is controlled by one authority and where there are no need to run third-party, potentially hostile, software… but do these even need that memory isolation? I mean: less than 10 years ago such needs were fulfilled by a IBM DOS variant, Rust may offer similar level of protection, most likely. But that's far cry from “hardware is not needed for memory protection”.

5 Likes

I'm not sure how proven or otherwise it is.

The idea is simple: If the language has total control of what goes on with application memory at compile time then there is no need for hardware memory isolation. That seems water tight to me, theoretically.

Wether that can be made to work in practice is another question.

However, Kevin Boos proposed the hypothesis that a memory safe language like Rust could be used to provide process memory isolation in operating systems for his PhD thesis. He then spent four years or so on building the Theseus operating system to test out the hypothesis.

You can see his PhD defence here: PhD Defense -- Theseus: Rethinking OS Structure and State Management - YouTube and read his dissertation here: YouTube

He has a bunch of other presentations on YouTube about this effort.

Like all good researchers he discusses previous work in the field, like the Java attempt you mentioned.

That is the impression I get. But that is OK. I look at it like this:

The authority is the Rust OS and the compiler it runs. Any software, form any untrusted party, could be run provided it was delivered as source code. The compiler then gets to build trusted executables out of it or reject it. The compiler is the gatekeeper, not the processor hardware. That source would have to be Rust of course.

I have a couple of questions about that:

  1. What about Rust source that uses unsafe? Either unsafe would have to be disallowed or some means of verifying it does not break OS memory barriers would have to be in place. Either further compile time analysis or run time checks. Kevin mentions compiler "plug-ins" to do something like that but I have no idea what exactly.

  2. What about memory leaks? As I understand leaking memory is not unsafe in Rust. So it seems a leaking Theseus application could eventually bring down the OS. Or perhaps that is obvious and Kevin has limited the amount of memory processes can have.

  3. What about that compiler? Currently Rust depends on C++, that is the LLVM compiler. That is certainly not safe so presumably could not be run on Theseus. That's not so bad, I could grab your untrusted source and compile it on my PC for running on my Theseus machine.

Anyway, Theseus has other attractive features enabled by it's software memory management. Like the ability to hot swap parts of the kernel and OS out at run time. Which would be cool for live updates of all kind of things.

Perhaps... Developments like Theseus could feed back into Rust development. They might suggest were how Rust could be modified/improved to make such an OS practical.

It's the same status JVM and CLR had for decades. Yet in practice it doesn't work.

But that's the only question that matters. Because, as I have said, we had languages which guarantee the ability to isolate programs from each other without help from hardware for decades, yet that never worked in practice. What makes you think Rust would change that?

Yes. And there are dozens (hundreds?) or similar PhD theses for JVM/CLR and many, many, MANY other models. Lisp was popular last century, but this century seems to fall out of favor.

They all work in theory and fail in practice so what makes that one special?

This sounds, basically, where JavaScript have started. Yet today all browsers use OS-level security to isolate renderers. Because, in practice, you can either make fast JIT-compiler for JS or safe, trustable, JIT-compiler, but not both at the same time.

Most C++ code can be compiled into Webasm and then run on that Rust machine. Including LLVM, of course.

No, they couldn't. That particular OS would never be practical. That doesn't mean they wouldn't be able to suggest some improvements. Just not with that goal in mind.

Also: it's actually highly unlikely that they would suggest anything. Work of a researcher are papers. Papers about that OS are published. Work is done.

Maybe someone would use this thesis as part of something practical (like happened with many other theoretical developments), but wouldn't expect anything practical to come out of that research directly. That's not how academy work.

Consider it… theoretical curiosity, IDK. Practical impact of that would be very low. Fortunately or unfortunately depending on what you expect.

2 Likes

No idea really and maybe you are right. However to my mind Theseus has a very significant difference to anything JVM/CLR etc. In the case of byte code interpreter systems the memory safety is supposed to be ensured by that run-rime byte code interpreter. Same as running any other processor emulation. But in the Theseus case the safety is baked into the syntax and semantics of the very language. Perhaps, as you say, that all works out the same, but it is an idea worth investigation.

If you ignore that significant difference and claim it will never work then you are also saying that the memory safety of Rust will never work, that it will always be buggy and failure prone. In which case why are we here? Might as well continue using C or C++. I'm not prepared to accept that. If Rust turns out to have such bugs they need fixing, and I'm sure they would be.

Also don't forget we can say the same about the memory isolation provided by hardware. Back in the day we had a big thick document, under NDA from Intel, detailing all the known bugs in the Intel 286 processor, most of them were failures of the memory protection. In modern times we have Spectre and Meltdown. Who knows what else is lurking in there?

Bugs are bugs, hardware or software they have to be fixed.

I can imagine many ways it would be very practical.

Actually, I gather from the presentations of Kevin Boos that this is not a case of write paper, get PhD then forget it. He wants to continue development of Theseus.

I'm encouraged by the fact that Theseus has orders of magnitude less code in it than the Linux kernel or Java or Javascript run-times. That means orders of magnitude less chance for bugs that would break memory safety.

Because there's a huge difference between protecting against malicious and well-intentioned actors.

Safe rust is incredibly helpful for avoiding accidental mistakes. But there are dozens of reasons why it's currently useless for protection against a malicious actor.

I wouldn't be surprised if there's always at least one of those lying around. But that won't stop me from using Rust, since it still helps me.

10 Likes

Not so huge to my mind. A malicious actor will exploit your buffer overruns, for example, for his own purposes, I well intentioned actor will do it by mistake.

I guess they are bugs that will get fixed. Same as all those bugs in hardware protection schemes.

Quite so.

What makes it different???

Indeed. The idea was: making full-blown compiler a part of TCB (trusted code base) is insanity. Let's make something simpler (namely: bytecode interpreter) part of it.

Bytecode interpreter is simpler that full-blown compiler (especially of Rust caliber) thus it's obvious that it's harder to make “syntax and semantics of the very language” a security boundary.

It kinda worked for some time. But then people complained about speed…

To write correct programs, obviously.

It's one thing to create thing which catches accidental mistakes. JVM does that, CLR does that, Haskell does that, Swift does that, lots of other languages do that.

Rust is just one of many which achieves that with some nice additional bonuses.

It's completely different thing when language runtime is used as a security barrier designed to contain determined and resourceful adversary. JVM fails at that, CLR was never considered (as far as I know), JavaScript failed, too.

Why would Rust succeed?

Impossible. To achieve that you would need to stop development of the language and improvements in the optimizers. That is where both Java and JavaScript failed.

When you use compiler as a barrier against accidental bugs it's Ok if your compiler have dozens or even thousands of bugs. As long as they are for crazy corner cases they would bother noone. Goals of compiler and language users are aligned thus almost-but-not-100% is more than good enough.

But if that's a security border then even one, single, bug is too much. Adversary need one to creak the security, after all. Goals of compiler and “language users” (attackers, here) are aligned thus almost-but-not-100% is not good enough.

Yet noone ever was able to switch from protected mode to real mode despite all these bugs.

That big thick document still is much smaller than bugzilla buglist for any modern compiler.

Nah. That just means that order of magnitude of code is not written yet.

All kernels start small and lean. Before they become practical they add enough features, to, well, make them practical.

1 Like

It depends how hard it is to do by mistake.

In C++ it's trivial to buffer overflow by mistake. In Safe Rust it's quite difficult to do it accidentally, despite the bugs.

That's why the difference is huge.

6 Likes

Well clearly running processes from actual pre-compiled instructions from pre-validated source code is very different interpreting byte codes from some unknown source. Doesn't that step up the game a bit?

And that. Precompiled code will be faster than interpreting byte codes. And does not require all that complicated JIT.

Looks like we are never going to agree on this. And I do appreciate your concerns.

In the meantime, I think Theseus has interesting use cases. For example, what if all the code I want to run is actually mine. I trust myself not to try and break my own system. But I would like all that code to run in different processes. So I can start, stop, update them independently. So that a failure of one does not bring down the others.

All the while saving the overheads of traditional OS processes and kernel calls etc.

Also, Rust makes no attempt to address timing attacks. It mostly seems like Specter-hardened CPUs and OSs are going to treat the process boundary as the trust boundary, and in-process protection won’t mean anything against a determined attacker.

Firefox ran multiple websites in shared processes for awhile. Not any more.

3 Likes

Kevin Boos discusses Spectre/Meltdown style attacks in his PhD defence or one of his resignations on the YouTubes. My take away was that the situation was not hopeless.

Yeah. But that is a whole other world of millions of lines of unsafe code which needs protecting from itself!

Made simple correction to fix the obvious error.

It makes it harder. Compiler is very big and complex program. And with things like proc-macro you load hostile code directly into it. It's not designed to deal with hostile code at all (there are thousands of errors in LLVM bugzilla, many of these may be used to circumvent protections and, in fact, usecase “incorrect code can produce incorrect output” is not, usually, even considered a bug by LLVM developers).

Yup. It only requires an unicorn which doesn't exist and would never exist: compiler which is ready to accept code from the hostile adversary and promises to work if that would happen.

Noone even attempted to create such a beast thus it's completely unknown if such thing may even exist in principle.

Then you don't need protection at all. If there are no [potential] attacker then what are you protecting from?

Just compile all that code as one big Rust program. And if it's too big then just split it (that certainly can be done since proc-macros are doing that). Just don't call that on operation system with protection. Because protection is not there.

If you plan to use Rust (where leaking memory is not considered unsafe) then you would have to have separate memory for these processes and after that you may do what people were doing for last half-century.

OS processes and kernel calls are very fast if you use well-designed kernel (Linux, e.g.) and don't need protection against Spectre/Meltdown/etc.

If you do need protection against Spectre/Meltdown/etc then excluding overhead of switching between protection rings buys you even less.

1 Like

Please don't modify what I said when quoting me. I meant what I said and it was not a mistake to be fixed.

What I meant was that if I take "hostile" source form some unknown, untrusted vendor, then I could trust the Rust compiler to ensure that hostile code does not access anything it should not. A Rust function can only modify memory it is given and allowed to modify. A Rust program is just a big function starting at main. Ergo that hostile program cannot mess with the rest of the code in my system.

Contrast to running some hostile byte code from some unknown and untrusted source. In this case I have no idea what that byte code does or even if it was generated by a good compiler. It could have been hand crafted to exploit holes in the run time.

Now, as you and others point out this is not going to be 100% reliable unless the Rust compiler is bug free. I agree. However it may be plenty good enough in many cases and as history shows even hardware protection turned out to be buggy. he world has not melted down as a result though.

In the proposed scenario I am protecting myself from myself. Or perhaps the rest of the team that I presumably trust.

That is exactly what Theseus does. It's not as if every program was lined into one huge binary with a single stack and heap. No, Theseus loads programs into memory it has allocated for them.

That is contrary to the entire motivation behind introducing async to Rust. Kernel calls and context switches are slow, which it is what async is all about alleviating.

Weather compiler enforced memory protection turns out to realise the supposed gains is yet to be seen I think. But Kevin is set on finding out.

Apparently many do need protection from Spectre/Meltdown. Hence there are extensive modifications to the memory mapping in the Linux kernel to work around it. Which a performance hit from 5 to 40 percent depending on who you listen to. The kernel devs really don't like that work around.

1 Like

No, you can't. Because untrusted vendor is not obliged to supply you with code which doesn't trigger compiler bugs and doesn't try to circumvent protection using compiler errors.

If and only if we can trust the compiler. We can't. LLVM compiler is not designed to deal with untrusted sources thus Rust compiler is not designed to deal with untrusted sources either.

Sure, you can, probably, make some other compiler, but then generated code would be slower which would also make the whole thing quite useless.

Indeed. But in that case we, at least, have a component which is designed to deal with untrusted input.

In your design you are trying to use something which was never designed to deal with untrusted input as a security barrier.

This may only end up in tears.

Yes, but at least in that case you are dealing with a component designed to deal with the hostile adversary.

Compiler is not designed to deal it to the degree that I know guys who were specifically forbidden to file bugs found by fuzzers. To not fill the bug tracker with “useless bugs”. Sure that was Intel Fortran Compiler and not Rust Compiler, but I wouldn't expect LLVM developers to react any differently.

They have too many bugs they have to fix generated by legitimate non-hostile inputs, they don't really have time and manpower to deal with hostile inputs.

No.

Kernel calls are cheap (at least on Linux, Windows is another story, it's roots from microkernel craze still are feeling even after all these years) and you don't need to exclude kernel to make context switches fast. You just need to add couple of syscalls (that's what Google did to it's own kernel decade ago).

The only reason for async existence is the fact that you don't always have the luxury of rolling out your own kernel.

But if you can not do that then it's unclear how you can use that strange beast with Rust kernel and Rust userspace.

They don't like that workaround because it makes kernel syscalls slow. But they can only worry about that because without these mitigations they were extremely fast.

And if your OS would also need these mitigations then difference between existing, tried and tested, solution and new fanged Rust-based and shiny would be even smaller.

From what I remember from this talk, it's not actually the kernel calls and context switches that are expensive anymore, but the disruption they cause in the scheduler.

1 Like

Absolutely. But that can be fixed with changes to the kernel (like Google did).

Lots of Rust users don't have the luxury of changing the kernel (maybe they are dealing with Windows, or maybe they are just too small to roll out their own kernel) — ergo they need async.

But if they couldn't change the kernel, then how would new-kernel-written-in-Rust help them?

I'm not disagreeing with you. I'm just pointing out that it's misattributing the source of the lost performance.

...

You know, I do agree with you. A 100% bullet proof compiler checked OS like Theseus is predicated on having a 100% bullet proof language syntax and semantics that enforces the protection and of course a 100% correct compiler implementation. And I agree that nothing is 100% correct and so this may end in tears.

Hmm... The offer on the table with Rust is memory safety. That is all an OS like Theseus is asking for. If Rust has a failure in providing memory safety then that is a bug that needs fixing.

I have a challenge then:

Write a Rust program that endlessly runs two different functions in two threads, async or sync. Call them alice and eve. Arrange that eve can read and/or write data belonging to alice. That is to say data that is not actually given to Alice and Eve by the caller for mutual access and without the use of unsafe.

Re async: My understanding is that async is introduced into languages like Rust and C++ as a consequence of the C10K problem: C10k problem - Wikipedia

Historically we forked processes in order get work done in parallel. For example Apache forked a process for every request. This of course is horribly slow and memory hungry. So later we moved to using threads. Not as heavy as processes. But still things like Apache could not handle a lot of connections as the memory piled up with the thousands of connections of modern web services piled up. Most of those times with the threads just waiting for some input. The solution then was to move to async. Async relieves even more of the time and memory overheads of thread swapping. Async is good for waiting on lots of things at the same time. There is a reason nginx is renowned for its performance, it does things asynchronously.

It's worse than that. It's like “100% bullt-proof plan to colonize the solar system” — where everything is predicated on having that nice cheap rocket. Only someone else have to provide the rocket.

But isn't it putting horse before the cart? It may sense as some kinda theoretical plan, but before you would invest heavily into that thingie you need to know how and how to make that nice cheap rocket!

But who would fix these? For compiler developers bugs which lead to miscompilation of code (and violation of Rust guarantees) are only high-priority bugs if they are triggered by some common idiom which is triggered by common C++ code. Rust compiler have a nice collection of workarounds which are carried for years because they are not considered “important enough” by LLVM developers.

But Rust developers, in turn, have similar attitude to the Rust input! They wouldn't just go and fix some crazy-obscure-code compilation just because it can be used to violate some invariants. They may keep these unfixed for a long time. The fact that they are not affecting too many people would mean they would be assigned low priority.

Why would I write such a program before I have a chance to get some money from your system? If you want to see how that could have been done in older versions of the compiler — just look on the issue tracker. There are dozens of such bugs. If you want something like that to crack that OS — then you can build such code in a few days by playing with one fuzzer or the other.

One of the solutions was to move to async.

Sure. But it's hard to modify it, it's code is very hard to read and edit and, if you can change the kernel, there are another option: change scheduler.

Google did it as I pointed out above. Google Fibers are threads, they don't need all that async hoolpa! And if you plan to handle more traffic than Google… I would be very interested in knowing what do you plan to build and who would finance the whole thing.