Uninitialised static mut

I want to have a static mut of a complex type (a hashtable in this case) in my program. I will ensure manually that the small number of primitive operations available are encapsulated on a safe interface on top of the inherently unsafe mutation of a global static variable.

I initialise that static mut in main, via a call to a setup_database function. Therefore it needs to start uninitialised. How should I go about doing this?

Here is a short MWE:

struct Symbol;

struct Database {
  map: std::collections::HashMap<String, Symbol>, 
}

static mut db: Database = unsafe { std::mem::MaybeUninit::uninit().assume_init() };

fn setup_database() {
  unsafe { db = Database { map: std::collections::HashMap::with_capacity(31) } }
}

fn main() {
    setup_database();
}

Also yes I do realise this is a rust sin :slight_smile: but it makes sense for my application and (as far as I can see) there is no alternative to do what I want which is ergonomic and zero-overhead.

You don't wnant uninit().assume_init() -- it's UB, as MIRI can tell you:

error: Undefined Behavior: type validation failed at .value.map.base.table.table.ctrl.pointer: encountered uninitialized raw pointer
   --> /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/mem/maybe_uninit.rs:600:38
    |
600 |             ManuallyDrop::into_inner(self.value)
    |                                      ^^^^^^^^^^ type validation failed at .value.map.base.table.table.ctrl.pointer: encountered uninitialized raw pointer
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
            
    = note: inside `std::mem::MaybeUninit::<Database>::assume_init` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/mem/maybe_uninit.rs:600:38
note: inside `main` at src/main.rs:8:33
   --> src/main.rs:8:33
    |
8   |     let db: Database = unsafe { std::mem::MaybeUninit::uninit().assume_init() };
    |                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
    = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125:18
    = note: inside closure at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:63:18
    = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
    = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
    = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
    = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
    = note: inside closure at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:45:48
    = note: inside `std::panicking::r#try::do_call::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
    = note: inside `std::panicking::r#try::<isize, [closure@std::rt::lang_start_internal::{closure#2}]>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
    = note: inside `std::panic::catch_unwind::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
    = note: inside `std::rt::lang_start_internal` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:45:20
    = note: inside `std::rt::lang_start::<()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:62:5

You probably want to use https://docs.rs/once_cell/1.8.0/once_cell/#safe-initialization-of-global-data

(Or if you're on nightly, https://doc.rust-lang.org/nightly/std/lazy/struct.SyncOnceCell.html.)

1 Like

The usual way to approach something like this is an OnceCell, but if you want the version that directly implements what you intended with the unsafe code, then here is the correct way to do it:

use std::cell::UnsafeCell;
use std::collections::HashMap;
use std::mem::MaybeUninit;

#[derive(Debug)]
struct Symbol;

#[derive(Debug)]
struct Database {
    map: HashMap<String, Symbol>,
}

struct DbCell(UnsafeCell<MaybeUninit<Database>>);

unsafe impl Sync for DbCell where Database: Sync {}

static DB: DbCell = DbCell(UnsafeCell::new(MaybeUninit::uninit()));

fn setup_database() {
    let db_ref = unsafe { &mut *DB.0.get() };
    *db_ref = MaybeUninit::new(Database {
        map: std::collections::HashMap::with_capacity(31),
    });
}

fn get_database() -> &'static Database {
    unsafe { &*(*DB.0.get()).as_ptr() }
}

fn main() {
    setup_database();

    println!("{:?}", get_database());
}
1 Like

This should be a red flag, shouldn't it? You are explicitly telling the compiler that the data can be assumed initialized, whereas it was just created uninitialized. This can't possibly be right.

1 Like

Note this this is sound only as long as the code is not multithreaded.

Indeed, the setup_database function should probably be marked as unsafe.

1 Like

Thanks for the replies! Right now I'd like to fully understand what's going on, rather than using some magic like lazy_static which I don't exactly understand and which may be doing more than necessary. @alice's solution looks perfect in that regard, I will study it closely now to make sure I understand everything :slight_smile:

The OnceCell and lazy_static solutions essentially work by introducing some extra state next to the value to track whether it has been initialized.

1 Like

There’s also a middle ground of std::sync::Once, which will handle the run-only-once part independently of the underlying storage.

fn setup_database() {
    static INIT: Once = Once::new();
    INIT.call_once(|| {
        let db_ref = unsafe { &mut *DB.0.get() };
        *db_ref = MaybeUninit::new(Database {
            map: std::collections::HashMap::with_capacity(31),
        });
    });
}

fn get_database() -> &'static Database {
    setup_database();  // Will be no-op after first call
    unsafe { &*(*DB.0.get()).as_ptr() }
}

Indeed, setup_database() ought to be marked unsafe since it can't be called concurrently, or it could even be using its own static Mutex to actually guard against it (since the function is only supposed to be called once, the runtime impact of that extra safety check is thus nonexistent), and get_database() should be marked unsafe as well since it has to be called only after a call to setup_database() has been completed.

At that point, depending on the use case, having some flag that keeps track of the state to guard against such misusages even on get_database() seems a sensible thing to do, although it won't be zero-cost indeed. The most natural version of that flag comes from free from the Once type of the standard library, which is basically a three-or-four-state atomic / thread-safe flag, that tracks whether we are in:

  1. the not yet called (and thus, for us, not yet initialized) state;
  2. the currently being concurrently called state (thus the initializing function should not be called; the caller should just wait for the function's completion);
  3. already called and successfully completed (and thus, for us, knowledge of the global having been initialized) state;
  4. and the one easy to forget, the already-called-but-the-call-panicked-and-thus-did-not-sucessfully-complete state; regarding which the only sensible default behavior for further calls is to panic as well (this one is optional, since it's just a special-case of 2. that panics instead of deadlocking).

Hence @2e71828's:

And, at that point, the whole API is pervasive enough for the whole abstraction being packaged together into that "stateful Once", of sorts, or more precisely, the "lazily initialized value", which is what OnceCell<Database> does:

fn get_database () -> &'static Database {
    static DATABASE: OnceCell<HashMap<…>> = OnceCell::new(); /* uninit */
    DATABASE.get_or_init_with(|| -> Database {
        /* code that creates a Database */
    })
}

This is (a potentially more optimized) equivalent to the code with a MaybeUninit::uninit() mutable static, and then a getter which first performs a Once-guarded call to a function that initializes it to only then fetch a shared borrow to it.

  • Finally, since having to call a get() function is a bit cumbersome / verbose at the use-sites, there is a (perhaps too) clever trick to hide that call under the rug: the . (dot) operator is one where a hidden Deref::deref() call can happen, so one can fake having a &Database static by having a dummy empty value which Derefs to the Database by internally calling that get…() function.

    In the world of the once_cell crate, one can do that by replacing:

    use ::once_cell::sync::OnceCell;
    
    fn get_thing() -> &'static Thing
    {
        static THING: OnceCell<Thing> = OnceCell::new();
        THING.get_or_init(|| { body… })
    }
    

    with:

    use ::once_cell::sync::Lazy;
    
    // Doing `*THING` or `THING.field_or_method_of_Thing`
    // will call `<Lazy<Thing> as Deref>::deref()` which is like
    // `get_thing()` above: its body is:     vvvvvvvvv
    static THING: Lazy<Thing> = Lazy::new(|| { body… });
    

    And finally, lazy_static! further hides stuff under the rug by avoiding that repetition of Lazy, an "implementation detail":

    // Exactly the same semantics as the previous snippet.
    ::lazy_static::lazy_static! {
        static ref THING: Thing = { body… };
    }
    

I hope that helps you:

:slightly_smiling_face:

2 Likes

Just for fun (it doesn't matter since much better solutions have been posted).
This code mostly works by sheer luck - setup_database calls the destructor of the old db value when a new one is assigned - when setup is called for the first time, this drops an uninitialized value which is UB and could end bad. Of course, as usual, we regret when it doesn't crash we do something that's wrong. Passing errors silently is always so much more scary :slight_smile:

2 Likes

As already pointed here, this is not the only UB here. The code calls MaybeUninit::uninit().assume_init(), which is almost always an instant UB (the only type it's valid to immediately .assume_init() for, without actually perform initialization, is MaybeUninit<T> - which does not really help). The compiler could very easily recognize that and replace the whole program with a ud instruction (or formatting your hard drive :stuck_out_tongue:). I suspect the only reason it does not do that is because this pattern is still somewhat used in the wield, especially given that the deprecated std::mem::uninitialized() (which I've seen in production) does exactly the same.

1 Like

And that's the reason I didn't mention it, why go over the same thing twice. :slight_smile:

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.