I want to have a static mut of a complex type (a hashtable in this case) in my program. I will ensure manually that the small number of primitive operations available are encapsulated on a safe interface on top of the inherently unsafe mutation of a global static variable.
I initialise that static mut in main, via a call to a setup_database function. Therefore it needs to start uninitialised. How should I go about doing this?
Also yes I do realise this is a rust sin but it makes sense for my application and (as far as I can see) there is no alternative to do what I want which is ergonomic and zero-overhead.
The usual way to approach something like this is an OnceCell, but if you want the version that directly implements what you intended with the unsafe code, then here is the correct way to do it:
This should be a red flag, shouldn't it? You are explicitly telling the compiler that the data can be assumed initialized, whereas it was just created uninitialized. This can't possibly be right.
Thanks for the replies! Right now I'd like to fully understand what's going on, rather than using some magic like lazy_static which I don't exactly understand and which may be doing more than necessary. @alice's solution looks perfect in that regard, I will study it closely now to make sure I understand everything
Indeed, setup_database() ought to be marked unsafe since it can't be called concurrently, or it could even be using its own staticMutex to actually guard against it (since the function is only supposed to be called once, the runtime impact of that extra safety check is thus nonexistent), and get_database() should be marked unsafe as well since it has to be called only after a call to setup_database() has been completed.
At that point, depending on the use case, having some flag that keeps track of the state to guard against such misusages even on get_database() seems a sensible thing to do, although it won't be zero-cost indeed. The most natural version of that flag comes from free from the Once type of the standard library, which is basically a three-or-four-state atomic / thread-safe flag, that tracks whether we are in:
the not yet called (and thus, for us, not yet initialized) state;
the currently being concurrently called state (thus the initializing function should not be called; the caller should just wait for the function's completion);
already called and successfully completed (and thus, for us, knowledge of the global having been initialized) state;
and the one easy to forget, the already-called-but-the-call-panicked-and-thus-did-not-sucessfully-complete state; regarding which the only sensible default behavior for further calls is to panic as well (this one is optional, since it's just a special-case of 2. that panics instead of deadlocking).
And, at that point, the whole API is pervasive enough for the whole abstraction being packaged together into that "stateful Once", of sorts, or more precisely, the "lazily initialized value", which is what OnceCell<Database> does:
This is (a potentially more optimized) equivalent to the code with a MaybeUninit::uninit() mutable static, and then a getter which first performs a Once-guarded call to a function that initializes it to only then fetch a shared borrow to it.
Finally, since having to call a get() function is a bit cumbersome / verbose at the use-sites, there is a (perhaps too) clever trick to hide that call under the rug: the . (dot) operator is one where a hidden Deref::deref() call can happen, so one can fake having a &Database static by having a dummy empty value which Derefs to the Database by internally calling that get…() function.
In the world of the once_cell crate, one can do that by replacing:
use ::once_cell::sync::Lazy;
// Doing `*THING` or `THING.field_or_method_of_Thing`
// will call `<Lazy<Thing> as Deref>::deref()` which is like
// `get_thing()` above: its body is: vvvvvvvvv
static THING: Lazy<Thing> = Lazy::new(|| { body… });
And finally, lazy_static! further hides stuff under the rug by avoiding that repetition of Lazy, an "implementation detail":
// Exactly the same semantics as the previous snippet.
::lazy_static::lazy_static! {
static ref THING: Thing = { body… };
}
Just for fun (it doesn't matter since much better solutions have been posted).
This code mostly works by sheer luck - setup_database calls the destructor of the old db value when a new one is assigned - when setup is called for the first time, this drops an uninitialized value which is UB and could end bad. Of course, as usual, we regret when it doesn't crash we do something that's wrong. Passing errors silently is always so much more scary
As already pointed here, this is not the only UB here. The code calls MaybeUninit::uninit().assume_init(), which is almost always an instant UB (the only type it's valid to immediately .assume_init() for, without actually perform initialization, is MaybeUninit<T> - which does not really help). The compiler could very easily recognize that and replace the whole program with a ud instruction (or formatting your hard drive ). I suspect the only reason it does not do that is because this pattern is still somewhat used in the wield, especially given that the deprecated std::mem::uninitialized() (which I've seen in production) does exactly the same.