High-level advice on an atypical use case? (multiple rust executables running in a shared memory space)

Hey friendly Rust folks! I hope this is an appropriate question for this forum, first time checking it out.

Context: I've been playing around with building a hobby OS in free time, and after laying out the basics (64-bit mode, virtual memory / physical memory allocation, kernel heap, multitasking with preemptive scheduling) I'm starting to think about building a user-space. Rather than poorly reinventing the posix wheel, I decided to try out a pretty atypical architecture.

Specifically, I'm imagining all user-mode processes actually run in one shared virtual memory space. Usually this is a big no-no, because any old process would happily be able to interfere with any other, but that's where Rust comes in. If I understand right, any purely safe-Rust code cannot "break the memory sandbox" so to speak. So if we imagine a deployment system in which applications are distributed via a trusted build server that guarantees building with something along the lines of --cap-lints forbid -Funsafe-code and produces PIE output, then these applications could be loaded at non-overlapping locations in virtual memory and trusted to run side-by-side. Of course eventually they would need to call into kernel syscalls, which must use unsafe C ABI calls, but these could be wrapped inside a safe syscall library linked in by the trusted build system.

So question 1: does this seem like a tenable security model? Does it seem likely that one would need to safeguard against various edge cases, weird build requirements, unexpected compiler flags, etc. in a way that quickly becomes intractable? It seems like trusting rustc to sandbox user-space applications if they can be guaranteed to use only safe code is easy to buy, but it's not so obvious to me whether this "safe-only" guarantee is blanket enforceable in a nice way.

The other half to this question has to do with getting these processes to talk to one another. Working with a single shared memory space doesn't seem particularly useful if processes basically just communicate through IPC or not at all anyway (sure, maybe we minorly reduce overhead in context switching, but that hardly seems worth the pain). But my gut feeling is that working in one shared memory space could open up a more natural "thread-like" model for processes, in which the kernel may only be required to help processes pass pointers to establish a shared channel, whereas the default mode of communication could then look more like the high-level thread communication available in Rust. E.g. once an appropriate channel is established, objects of complex type could be passed across without serialization over internal sockets or complicated memory mapping procedures.

The difficulty then translates to ensuring that the typechecking in two distinct Rust programs can somehow cooperatively ensure that only objects that are understood as the correct type on both ends are passed (to preserve the safe Rust guarantees). I think something like #[repr(C)] can ensure that structs are arranged in a common format, but more basic that than, it seems to me that the "libkernel" API (safe Rust wrappers around syscalls) would either have to specify fixed formats of objects that it knows how to set up channels for, or there would need to be some runtime way(?) to check that the types match on either side.

Question 2 then is do you have suggestions for designing the type system to make sure we are "impedance matching" the object types at either end of a kernel-established channel (or other mode of communication)? It seems not too difficult to offer libkernel functionality to construct channels for specific types, and maybe just providing a few building-block types would be sufficient, but I wonder if there's a nice generics approach here. I guess fundamentally the processes at either end are built with different invocations of rustc, so maybe this is impossible?

The final issue (that I've thought of so far...) is that in the usual process model, there is a process-local heap available to allocate from for each process. For example the global Rust allocator that is used in std should somehow talk to the OS to get at this heap. This heap is also conveniently cleaned up when a process dies, so even if the process happily leaks a big pile of memory, it doesn't endanger the OS. In this shared virtual memory model, one could do the same, but then we run into allocation lifetime issues: if an object was created by process 1 and ownership was transferred over a channel to process 2, then we run into a problem if process 1 dies while process 2 continues to try to access the object. On the other hand, if we keep one big heap for all processes (which could work in a single memory space), then lifetimes are safely managed through Rust's lifetime analysis, I think; but now, any application-level memory leaks do not get cleaned up even when the process dies, and instead become OS-level memory leaks.

Then Question 3: Could this issue be mitigated by defining custom Box types that use an allocator drawing from persistent OS-global heap memory? E.g. only objects needing to be shared between processes would be constructed with this Box type, and the usual ownership rules would apply to ensure this sort of memory is cleaned up. The same memory leak worries persist, but become localized to code that attempts to share things between processes, which might be rarer. Alternatively, is there perhaps a simpler solution to managing this memory across multiple Rust execution units that I'm missing?

Overall, this is a pretty unusual use-case for Rust (for good reason...), so it's been difficult to find literature on how to get the compiler and type system working in our favor. Hopefully the experts here have some thoughts. Thanks for any insights!

5 Likes

Interesting. Some quick notes: Preventing unsafe code but allowing it in some dependencies is going to be difficult. There are also some non-obvious dangerous things, e.g. you can cause UB by using #[no_mangle] to define the same function multiple times. There are probably others.

It would also limit you quite a lot. There are lots of crates that use unsafe code, for example the rand crate does.

2 Likes

Interesting idea.

When this kind of thought crossed my mind a while back I thought the way to do it, and make it safe, is to use WASM. Basically the Rust OS kernel would load and run "user space" programs that have been compiled to WASM.
Like so: https://medium.com/wasmer/executing-webassembly-in-your-rust-application-d5cd32e8ce46

Of course there is a performance hit with this. But hey, for a hobby OS I would be happy with that. I suspect it's no worse than Android running it's user space code in a Java VM.

It offers advantages in safety and cross-platform, cross-language support. Any language that can be compiled to WASM can be run on said OS no matter what hardware it actually runs on.

Application delivery would be dead easy, just ship the WASM blob. As the creator of Docker said, "If WASM existed when we created Docker we would not have needed to create Docker"

And that is a far as my musing on the idea went. I have no idea if it would be possible to run any old Rust code as WASM if it uses threads or async tasks for example.

I do sometimes, semi-seriously, wonder about using WASM like this just to deliver application code to our remote embedded devices.

2 Likes

If that approach sounds appealing, you can also check this: http://adventures.michaelfbryan.com/posts/wasm-as-a-platform-for-abstraction/

1 Like

Thanks for the notes.

Preventing unsafe code but allowing it in some dependencies is going to be difficult.

Would you say it's easier to prevent unsafe code across the board instead?

It would also limit you quite a lot. There are lots of crates that use unsafe code, for example the rand crate does.

Yeah, it's definitely limiting with regards to packages. The rough sketch I had in my head was that one could offer specific packages with unsafe code as dependencies if their "unsafety" was localized in a way that was manually checkable. Of course this opens up a massive attack surface that we would have to keep on top of.

Thanks @ZiCog and @qaopm for the WASM pointers. It's definitely an interesting alternative for sandboxing. I've seen the Docker quote before, and it speaks volumes. So there's definitely some nice portability and sandboxing wins with this approach, but as you mention it may be hard to get some features working... I'll have to read into this more.

One immediate thought that comes to mind is that I don't think this would leverage the shared memory space. In which case the WASM abstraction might be more sensible to build on top of proper memory isolation for processes. Maybe I'm missing something on how WASM works though?

Hmmm...

As far as I can tell WASM programs cannot access any memory outside their WASM address space. On the other hand the host must be able to see in, else how would even their printf work?

Speaking from ignorance and optimism it seems to me that one could fire up hundreds of Rust threads each one of which runs a WASM engine that runs a user application.

In that scenario I can imagine those WASM user space programs communicating through the shared memory by calling back to our Rust host OS. Posting to mail boxes or comms channels. I suspect that would involve a lot of copying though.

You've got me curious now....

1 Like

Safe Rust cannot currently be used as a secure sandbox the way you propose.

1 Like

Yeah, I'm not surprised to see that there are soundness issues with Rust. But I expect this is something the developers are always working hard to resolve, so it will only get better over time. For a hobby OS, I'm not too concerned if the practical implementation is insecure, more curious to see whether this is tractable at all in the abstract.

I did a bit more reading into WASM. It would definitely be interesting to host a pile of WASM engines that run individual apps and hand them just enough OS functionality to do useful things together. I think I've missed how this is actually safe at the end of the day, so I guess I need to read closer.

Yep, it seems like in wasmer at least there is some binding of host functions into the WASM space. Naively I don't really see how this could be secure yet. The print function implementation in the blog post @qaopm mentioned (http://adventures.michaelfbryan.com/posts/wasm-as-a-platform-for-abstraction/) isn't really checking the pointer it gets passed. So could the WASM code just pass it garbage memory in this case? That seems bad from a sandboxing perspective.

EDIT: I think I understand this example a bit better now. I guess WASM memory is accessible from the host via slices and so all is well. Definitely seems like the interface is very narrow though, making it hard to exploit the shared memory in the Rust host.

Coincidentally I just discovered Cloudflare's "cloud worker" service allows one to deploy Rust. In ten minutes I had a "hello world" web server up and running on Cloudflare: https://rust.conveqs.workers.dev/

Which all sounds like the WASM running OS we are talking about.

1 Like

Very cool! Indeed, sounds like they've got some kind of WASM-on-a-thin-OS setup:

Also, quoting a quote in the article:

WASM programs operate in their own separate memory space, which means that it's necessary to copy data in and out of that space in order to operate on it. Code that mostly interacts with external objects without doing any serious "number crunching" likely does not benefit from WASM.

So this still suffers from the problem of sandboxed code requiring heavyweight communication...

I think this helps put some more concrete words to my original point: it feels like "sandboxing vs cheap comms" is a false tradeoff in much the same way that "memory safety vs speed" is a false tradeoff that Rust is trying to break down. What do you think?

Another concern is taking into account CPU speculative optimisations and caching. If entire userspace is effectively one security context from CPU's point ot view then allowing access to efficient code execution plus precise timers can be a security issue (Spectre/Meltdown-style).

P.S. Here is my old thread about sandboxed Rust.

1 Like

Are you aware of Singularity OS? MSR created that back in the day, solely relying on software isolation. They devised some new language, Sing#, specifically for this purpose, but heavily based on C#.

They also removed the kernel boundary entirely. And I don't really see what you gain by keeping it, to tell the truth -- but could be missing something. I also recall a section specifically about their heap(s) used for data exchange. So may be worth checking out the paper.

1 Like

I was not aware of Singularity, thanks for the pointer! It sounds like I've been reinventing the wheel a bit based on a quick read over their paper.

The exchange heap is an interesting idea for this inter-process memory management. I'll read through this paper and related ones and chew over the possibility of using this sort of architecture with Rust as the language enforcing the isolation, instead of their GC'd Sing# language. A lot of what they mention on memory management across processes feels like how Rust does things: single ownership of objects, moving ownership when passing between threads, etc. I haven't yet found a reason this architecture would need a GC'd language with a runtime, but I'll keep looking into it.

Thanks @vi0, based on your thread it seems like we were thinking about similar use cases for Rust.

Timing attacks like Spectre/Meltdown are an interesting point. I wonder if the any of the Singularity literature addressed this. In any case, that feels like a more advanced security topic, and one that also pervasively affected more traditional OS architectures, so I may shelf that one for investigation down the road.

Beyond just Singularity, there was an earlier project like this using the language modula-3 SPIN in both spin and Singularity processes had to be written in the safe language.

There is a rust project which works like this, tock which allows leveraging both safe-shared address space in safe-language as well as the traditional usage of memory protection for process in arbitrarily unsafe language.

Anyhow, certainly look at tock if you're wanting to do this with rust!