RFC and paper: Experiences Building an OS in Rust

I'll be presenting a paper at PLOS (Programming Languages and Operating Systems) on Sunday about our experience building an embedded operating system in Rust over the last year. A pre-print is here: http://amitlevy.com/papers/tock-plos2015.pdf or a slightly tldr blog post here: http://mostlytyped.com/posts/experiences-building-an-os-in-ru

The gist is that we ran into some unexpected difficulty building the OS, and (re)realized that system designs often must reflect the features/constraints of the language.

Some challenges we faced are fundamental (ownership vs inherently circular dependencies) and require the design of the system to change. Others are issues that could (and may) be simply solved by small changes to the language or core libraries.

At the end of the paper we propose an extension to the type system that would reflect concurrency in the type system (hearkening back to external uniqueness for inspiration), and thus allowing some violation of ownership in certain cases where it could be done safely. While we already know that implementing our specific proposal in Rust would be problematic, we're still very interested to understand if reflecting concurrency in the type system might be useful.

7 Likes

I'm really excited about this!

Some of our recent work has been trying to stabilize more low-level stuff, and so general experience reports on things like OSdev are very useful.

I only read the abstract and the proposed extension, but haven't had enough time to digest it yet. Looking forward to this!

This was a good read. The particular problem you were having with wanting safe mutable aliasing is very interesting!

The execution context is an interesting solution, but as you pointed out in "Limitations," preventing mutable aliasing is not just about thread safety. You hinted at it with Vec, but it isn't just limited to Vec because the notion that "mutable alias => unique access" is ubiquitous. IIRC, this property is used to elide bounds checks in iterators in a way that is safe.

Have you explored using Rc<RefCell<...>> at all? That is the "standard" way to get safe mutable aliasing (where actual mutation is guaranteed to be unique through borrow_mut), but I agree it does not meet your criteria of checking this sort of thing at compile time.

TL;DR skip to section 4 for the interesting stuff

Discussing this on Twitter/IRC:

  • Wants const size_of closures (for statically allocating that many bytes). We just need RFC #1245 to get implemented (it has been accepted), and for someone to mark the size_of intrinsic as a const fn. However this might get into the weeds pretty bad since "size" is handled by trans (LLVM even).

  • Shared mutability of hardware registers should be able to be soundly handled by Cell.

This leaves the big issue:

The kernel has been architected as a single "main" thread, and a bunch of interrupt handling threads. All the interrupt-handling threads do is enqueue a message for the main thread to handle. The main thread then drives all the drivers off of this queue, so they all run on one thread. However they want to do some shared mutable stuff. In particular, I think they want closures that close over the same value mutably? RefCell isn't particularly acceptable because crashing is Super Bad, and they would like to avoid runtime checks anyway. They just found out about Cell, so it might work, but they seem to think
this wouldn't be right.

You can make Cells pretty fine-grained. You can also consider @SimonSapin's proposed extension to Cell which has a replace method for non-Copy types so you could replace (say) an arbitrary Cell<Option<T>> with None temporarily through a shared reference.

Alternatively, it might be possible to manually thread the shared mutable state since it's all being driven by some top-level event-loop (as far as I can tell). This was actually an old std pattern: std::collections::hashmap::HashMap - Rust (the A is "shared" closed state).

Interested to hear more about the finer details of the shared mutable state!

2 Likes

However, under certain constraints, for example,if all aliases are used from the same thread, mutable aliases mightbe perfectly safe.

Not really. Threads have very little to do with this rule.

(You do somewhat note this later, though)

As other people have mentioned, RefCell, Cell, (or Rc<SomeCell>) would be what you want here.

If the values are such that they don't contain enums or allow other forms of invalidable interior references (like vectors), then we can perhaps handle it like this. We could add a marker trait MutAlias, which allows for aliased mutable pointers, and can be derived like Copy (#[derive(MutAlias)]) on structs containing MutAlias types, except that it doesn't work for enums at all. This isn't a change I see happening to the Rust type system, but it's one that neatly patches up the memory safety problems with mutable aliases. It doesn't patch up the whole set of (memory-safe) footguns with mutable aliasing, though.

5 Likes

I've tried to reply to this on reddit, here follows a transcript:


They focus on size_of and while that is entirely doable (LLVM doesn't do anything interesting about computing sizes, it's a bunch of really simple "algorithms" and we already replicate most of it)...

There is no point. Really, what they actually want is function-scoped statics and mem::uninitialized() in them (or even None, but that wastes a few bytes), e.g.:

fn to_static<F: 'static>(closure: F) -> &'static F {
    static CLOSURE: UnsafeCell<F> = UnsafeCell::new(unsafe {
        mem::uninitialized()
    });
    unsafe {
        *CLOSURE.get() = closure;
        &*CLOSURE.get()
    }
}

It doesn't even have to use the same static location for each F (in cross-crate situations, for example), since the function returns the address in which the value was written, so something like this would not be hard to support.

The problem with that, however, as I hope many of you will notice, is that calling this function twice for the same F will end up invalidating the first value, which for closures means re-entrance leads to UB.
As such, the API they want is wildly unsafe.

There is a solution, if they can move the closure storage to the callback invocation site: instead of returning a reference which is easy to invalidate, store that reference into an Option<&Fn()> which can only be accessed by two functions:

One function is similar to the above to_static, except it converts the reference to &Fn(), wraps it in Some and stores it in the Option<&Fn()> callback holder.

Another function takes the &Fn() out of the Option<&Fn()> callback holder, replacing it with None, and calls that closure, discarding the &Fn() afterwards.

Except that still has the re-entrance issue: they could either guard against re-entrance by keeping the Some until the call has ended and refusing to register a new callback (but this goes against their desired operational model).

Or they could move the closure data on the stack and only then call it, but this is pretty tricky to do, the best I could come up with is:

trait VirtualCall {
    unsafe fn move_call(&self);
}

impl<F: Fn()> VirtualCall for F {
    unsafe fn move_call(&self) {
        ptr::read(self)();
    }
}

static CALLBACK: Cell<Option<&'static VirtualCall>> = Cell::new(None);

pub fn register<F: Fn() + 'static>(closure: F) {
    static CLOSURE: UnsafeCell<F> = UnsafeCell::new(unsafe {
        mem::uninitialized()
    });
    unsafe {
        *CLOSURE.get() = closure;
        CALLBACK.set(Some(&*CLOSURE.get()));
    }
}

pub fn invoke() {
    if let Some(cb) = CALLBACK.get() {
        CALLBACK.set(None);
        unsafe {
            cb.move_call();
        }
    }
}

The last 3 items (CALLBACK, register and invoke) have to be separate for everything with a callback (spi_write in this case), and preferably encapsulated.

They could maybe get away with a macro to define these items, and have the CALLBACK hidden inside a struct which only provides register and invoke as methods.

Also, single-threaded contexts are assumed - this scheme can be made safe via lifetime-based 0-cost proof tokens for "interrupts are disabled" (unless NMIs exist, ofc).

Where might I find other papers/RFC's that outline similar proposals to the execution contexts discussed here? I'd be very interested in reading/internalizing that information.

I've also run into some cases in Rust where I'd like to have shared mutable state internally in a cyclic graph that seem to be able to be solved with what @Manishearth is talking about in his post, and in general I'd really like to get and overview and keep an eye on various discussions/proposals for Rust regarding this topic. And yes, I am aware that RefCell et al would solve the issues I've run into already (albeit for performance they might cause problems; I'll be doing side-by-side tests soon), I'm still interested in other solutions/proposals :slight_smile:

One such resource that was mentioned by the tock paper was the "External Uniqueness is Unique Enough" paper, which seems to be relevant (again minus the cases outlined in the link @Manishearth mentioned).

I suppose https://github.com/rust-lang/rfcs is already a good start; hadn't actually seen that before now :stuck_out_tongue: