I've reduced the unsafe
boundary, and thus the necessary cognitive burden to review the invariants, down to an UnsafeOption<T>
which only features a .get_or_insert_with()
and a as_ref()
getters:
#![deny(unsafe_code)]
use safety_boundary::UnsafeOption;
mod safety_boundary {
#![allow(unsafe_code)] // <- Safety boundary
use super::*;
/// UnsafeOption is a less-optimized but safer version
/// of MaybeUninit, because whether or not it is initialized
/// can be inspected.
pub
struct UnsafeOption<T> (
UnsafeCell<Option<T>>, // Never `!Sync`
);
/// The following could be added, although it's not necessary.
unsafe
impl<T : Send> Send for UnsafeOption<T>
{}
impl<T> UnsafeOption<T> {
pub
const
fn none ()
-> UnsafeOption<T>
{
UnsafeOption(UnsafeCell::new(None))
}
pub
fn as_ref (self: &'_ UnsafeOption<T>)
-> Option<&'_ T>
{
unsafe {
(&*self.0.get() as &Option<T>)
.as_ref()
}
}
pub
fn try_init_with (
self: &'_ UnsafeOption<T>,
f: impl FnOnce() -> T,
) -> Result<&'_ T, &'_ T>
{
// SAFETY & invariants:
// - The moment `.as_ref()` yields a `Some`, it will never be mutated ever again.
// - Lack of `Sync` guarantees lack of parallelism, here.
// - We need to be mindful of re-entrancy, though. Here, the calls to `f()` or `drop(f)`.
if let Some(already_present) = self.as_ref() {
// We witnessed a `Some`: no longer mutated, all is fine.
drop(f); // <- do not forget about this side-effect.
return Err(already_present);
}
// We may be tempted to do `….write(f())`, but that
// `f()` could re-entrantly init the value, and let
// the obtained `&T` escape. It is thus better to just delegate
// to a by-value try_init:
self.try_init(f())
.map_err(|(my_val, already_present)| {
// warn!("re-entrant init!");
drop(my_val); // <- do not forget about this side-effect.
already_present
})
}
pub
fn try_init (
self: &'_ UnsafeOption<T>,
value: T,
) -> Result<&'_ T, (T, &'_ T)>
{
if let Some(already_present) = self.as_ref() {
// We witnessed a `Some`: no longer mutated, all is fine.
return Err((value, already_present));
}
// Up until this point, no `Some` can have been witnessed,
// so there is no `&T` yet: we can finally write the value
Ok(unsafe {
let p = self.0.get();
p.write(Some(value)); // <- The only write happens here.
(&*p).as_ref().unwrap_unchecked()
})
}
}
}
This is easier to review than something involving nested vecs and indexing and complex branches. And it is indeed sound: the "astute reader" will recognize here:
-
.as_ref()
as their .get()
;
-
.try_init()
as their try_insert()
;
-
.try_init_with()
similar to their .get_or_insert_with()
.
So, after the learning exercise of reimplementing unsync::OnceCell
ourselves, it is important to trash that code and use the one from that very popular library: indeed, there is always a chance to have made a mistake in any kind of code, but when unsafe
is involved, such mistakes can lead to UB/memory corruption, which in turn can way more easily make your code suffer from security vulnerabilities.
Thus, it is critical to reduce the chances of an allow(unsafe_code)
codebase to feature bugs , which can be achieved by having as many pairs of eyes reviewing the same "canonical" implementation, here: ::once_cell::unsync::OnceCell<T>
In this instance, the try_set
function then ends up rewritten as:
fn try_set (
self: &'_ NonMovingGrowableVec<T>,
index: u64,
item: T,
) -> Result<(), &'_ T>
{
let outer_index =
(u64::BITS - index.leading_zeros())
as usize
;
let inner_index =
(index - items_before_outer_index(outer_index))
as usize
;
let at_outer =
self.data[outer_index]
.get_or_init(|| {
let size =
usize::try_from(
items_at_outer_index(outer_index)
)
.expect("index too big for architecture!")
;
(0.. size).map(|_| UnsafeOption::new()).collect()
})
;
match at_outer[inner_index].try_insert(item) {
| Ok(_) => Ok(()),
| Err((already_present, _item)) => Err(already_present),
}
}
This is % unsafe
-free (but for stdlib and once_cell
, of course), so that we can proudly tweak the beginning of our src/{lib,main}.rs
with:
+ #![forbid(unsafe_code)] // ✅