Using sync::Once, mutable and MaybeUninit

This is my first time to use Once in Rust, and I wrote a demo like this:

struct Bar {}

struct Foo {
    bar: Bar,
}

static mut VAL: Foo = unsafe { std::mem::MaybeUninit::uninit().assume_init() };
static INIT: Once = Once::new();

fn get_cached_val(bar: Bar) -> &'static Foo {
    unsafe {
        INIT.call_once(|| {
            VAL = Foo { bar }
        });
        &VAL
    }
}

and I really don't like to make the VAL mutable, so I changed to the code below:

struct Bar {}

struct Foo {
    bar: Bar,
}
// No mut
static VAL: Foo = unsafe { std::mem::MaybeUninit::uninit().assume_init() };
static INIT: Once = Once::new();

fn get_cached_val(bar: Bar) -> &'static Foo {
    unsafe {
        INIT.call_once(|| {
            *(&VAL as *const Foo as *mut Foo) = Foo { bar }
        });
        &VAL
    }
}

The second version seems no different compare with the first one, since change a mut static variable is unsafe anyway, so which one do you perfered?

And do I use Once right? Especially that I'm using std::mem::MaybeUninit::uninit().assume_init(), I usually don't call assume_init right after uninit.

MaybeUninit::uninit().assume_init() is almost always instant undefined behavior. If you make Bar non-empty it actually fails at compile time

Playground

error[E0080]: could not evaluate static initializer
  --> src/lib.rs:14:32
   |
14 | static mut VAL: Foo = unsafe { std::mem::MaybeUninit::uninit().assume_init() };
   |                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ using uninitialized data, but this operation requires initialized memory

A general principle: unsafe doesn't mean "you get to break the rules", it means "you are now responsible for ensuring you aren't breaking the rules". In addition to the misuse of MaybeUninit, the second version also writes to a non-mutable place, which is itself UB.

static mut isn't that good, but to avoid it you need to use something interior-mutable as the type of the variable, not just use unsafe to bypass the lack of mut.

10 Likes

A corrected version might look something like

use std::{cell::UnsafeCell, mem::MaybeUninit, sync::Once};

#[derive(Debug)]
struct Bar {
    value: usize,
}

#[derive(Debug)]
struct Foo {
    bar: Bar,
}

struct SyncWrapper(UnsafeCell<MaybeUninit<Foo>>, Once);

impl SyncWrapper {
    const fn new() -> Self {
        Self(UnsafeCell::new(MaybeUninit::uninit()), Once::new())
    }

    fn init(&self, bar: Bar) {
        // Only allow mutability inside the associated `Once`.
        unsafe {
            self.1
                .call_once(|| *self.0.get().cast::<Foo>() = Foo { bar });
        }
    }

    /// SAFETY: Must be called after `init` has completed.
    unsafe fn as_ref(&self) -> &Foo {
        // This is probably necessary for safety in general, to ensure that all threads see the initialized state via atomics.
        // If you only call init before any other threads are spawned you *might* be able to remove it.
        assert!(self.1.is_completed());
        unsafe { &*self.0.get().cast() }
    }
}

// (Possibly) safe as long as we only use mutability inside Once, and never create a long lived reference before calling init
unsafe impl Sync for SyncWrapper {}

static VAL: SyncWrapper = SyncWrapper::new();

fn get_cached_val(bar: Bar) -> &'static Foo {
    VAL.init(bar);
    unsafe { VAL.as_ref() }
}

fn main() {
    println!("{:?}", get_cached_val(Bar { value: 1 }));
    println!("{:?}", get_cached_val(Bar { value: 2 }));
}

It passes miri, but I wouldn't swear to it being completely correct even within the fairly strict constraints described in the comments either.

You don't require the assert!(once.is_completed()) for synchronization; calling call_once guarantees that you can observe the effects of the initialization which ran. (In fact, the API might actually be sound with as_ref being safe if it assert!s that the once has ran (and the once is only run to initialize the data).)

Rather than homebrewing it, though, use the standard OnceLock instead. It's still unstable for now; the crate version is once_cell. (Homebrewing for learning purposes is reasonable.)

6 Likes

Are there any tutorials or articles about how to use MaybeUninit? I got a feeling that every time I use it, I use it wrong.

There's a simple implementation of OnceCell here:

1 Like

A MaybeUninit<T> is like an Option<T> that doesn't remember whether it is None or Some. Instead, you have to keep track of that information in some other way, and it has various unsafe methods that say stuff like "I know this is Some, let me get the value".

8 Likes

Also, if you want lazy initialization, don't roll your own. once_cell has sync::Lazy. Its usage requires no unsafe. Do this instead:

fn get_cached_value(bar: Bar) -> &'static Foo {
    static TEMP_BUF: Mutex<Option<Bar>> = Mutex::new(None);
    static CACHE: Lazy<Foo> = Lazy::new(|| {
        let bar = TEMP_BUF.lock().unwrap().take().unwrap();
        Foo { bar }
    });

    TEMP_BUF.lock().unwrap().get_or_insert(bar);
    &*CACHE
}

The trick is to temporarily move the run-time value bar into a global so that the closure referring to it can be used to const-initialize the Lazy.

3 Likes

Note that is still wrong: *self.0.get().cast::<Foo>() = ... is dropping an uninitialized value. Use ptr::write or MaybeUninit::write instead.

2 Likes

Look to me writing bug free code using MaybeUninit::uninit() is really difficult...

1 Like

It is difficult. There are properties that must be true of your program for it to be either meaningful to the compiler or wrong in a way the compiler can reliably detect and error on; we say that code that doesn't meet all of those properties exhibits "undefined behaviour", which basically means that the compiler has no clue what the code is supposed to mean, but instead has some random idea. This might or might not be the same as the programmer's idea of what the code means - but if it's not, the behaviour can be catastrophically wrong.

Without unsafe, Rust promises that there is no undefined behaviour; thus, if you remove unsafe from your code, the compiler either knows exactly what your code means, or generates an error to tell you that it couldn't make sense of your code.

The trouble here is that there's lots of useful code you can write that cannot be written using only mechanisms where the compiler always knows exactly what you mean. Without unsafe, you can't write this code - the compiler will error out because there are possibilities that aren't meaningful. Hence the existence of unsafe; this gives you access to extra features of Rust where the compiler only knows what your code means if you ensure that certain preconditions are met before using those features, such as dereferencing a raw pointer (where you have to ensure that the pointer points to a valid target by Rust's rules).

Unfortunately, we know from long and painful experience that people aren't very good at proving that preconditions are always met - we tend to make mistakes. One answer to this is to tell all programmers to "work better" - whenever you use unsafe code, you have to make sure that you're not making any mistakes at all anywhere in your program; but this gets us back to the problem of not knowing how to ensure that people don't make mistakes.

The other option is to say that we can use the module privacy barrier to "encapsulate" unsafe; instead of having to ensure that your entire program is mistake-free, we say that it's a bug in the module using unsafe if code in modules that doesn't use unsafe can make the program as a whole exhibit undefined behaviour. This makes the problem of not making mistakes easier to solve - you can make mistakes elsewhere, just not in modules that use unsafe. As a result, this is the Rust ecosystem route - somebody spends time writing a crate like once_cell or a standard library feature like std::sync::OnceLock, and does the work to make sure it contains no mistakes, and you can reuse it without having to do the hard work to ensure that you're not making any mistakes when programming - instead, your mistakes become either compile errors or tractable bugs.

6 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.