Is this unsafe code safe?

I have a struct which has a method .get(&self) which does some expensive computation to compute and return the output, but that output will not change on later re-invocations of this method. So ideally I want to cache the output and just read from the cache next time.

In order to cache the result in the struct I would need a mutable reference, i.e. fn get(&mut self) -> ..., but lets assume that is not an option for this use case.
An alternative is to use RwLock and Once inside the struct. That is doable but incurs all the RwLock overhead.

The other alternative is to use unsafe code below; I do not know if Once::call_once will make this code safe or not. Is this unsafe code safe? if not why not?

use std::sync::Once;

struct Foo {
    token: String,
    cache: String,
    once: Once,
}

impl Foo {
    fn new(token: String) -> Self {
        Self {
            token,
            cache: String::default(),
            once: Once::new(),
        }
    }

    fn get(&self) -> &String {
        self.once.call_once(|| {
            // Do the expensive computation.
            let token = self.token.clone() + &self.token;
            // Cache the result.
            unsafe {
                let cache = &self.cache as *const String;
                let cache = cache as *mut String;
                *cache = token;
            };
        });
        &self.cache
    }
}

It's not, because compiler is free to assume that Foo::get doesn't modify cache.

@Cerber-Ursi is there any alternative aside from using RwLock?
Also, seems like RwLock or Mutex implement this exact same functionality (i.e. immutable reference to mutable reference); How do they achieve this?

Here’s the answer:

std::cell::UnsafeCell

If you have a reference &T, then normally in Rust the compiler performs optimizations based on the knowledge that &T points to immutable data. Mutating that data, for example through an alias or by transmuting an &T into an &mut T, is considered undefined behavior. UnsafeCell<T> opts-out of the immutability guarantee for &T: a shared reference &UnsafeCell<T> may point to data that is being mutated. This is called “interior mutability”.

All other types that allow internal mutability, such as Cell<T> and RefCell<T>, internally use UnsafeCell to wrap their data.

1 Like

Arguably, for this use-case, you’re re-inventing the wheel by using unsafe code, whereas pre-existing safe solutions exist in the form of

1 Like

@steffahn UnsafeCell does not implement std::marker::Sync and so I get the error that it cannot be shared between threads safely.

Is there a sync equivalent?

UnsafeCell is unsafe to use anyway, the main reason why it isn’t Sync is so that when using it, you don’t accidentally mark a type as Sync when it shouldn’t be, similar to why raw pointers aren’t Sync either; if you ensure synchronization yourself (as e.g. in your example with the Once), you’re totally allowed to (unsafely) implement Sync for a type that’s using UnsafeCell internally; in fact that’s what things like Mutex do, too.

In fact, the existing (unstable) SyncOnceCell does exactly that: looking at its source you can see that it’s essentially just

pub struct SyncOnceCell<T> {
    once: Once,
    value: UnsafeCell<MaybeUninit<T>>,
}

The MaybeUninit is there in order to (unsafely) allow for an (uninitialized) initial default-value for value without any overhead.

1 Like

Here’s the equivalent of your code using once_cell::sync::OnceCell

use once_cell::sync::OnceCell;

struct Foo {
    token: String,
    cache: OnceCell<String>,
}

impl Foo {
    fn new(token: String) -> Self {
        Self {
            token,
            cache: OnceCell::new(),
        }
    }

    fn get(&self) -> &String {
        self.cache.get_or_init(|| {
            // Do the expensive computation.
            let token = self.token.clone() + &self.token;
            token
        })
    }
}

Rust Playground

2 Likes

And if you want to know how this all works, you might be interested in the discussion on my earlier question - "Write-once" static Option.

@steffahn once_cell::sync::OnceCell does the job but does it for sure have less overhead than a RwLock and std::sync::Once? The api looks like it has a lot more functionality which makes me worried about the overhead.

Also, no alternatives just using std which does not require a 3rd party crate?

once_cell is written by Rust team members and has been approved to be added to std. I appreciate concerns about depending on too many third-party crates, but I think this one should not be a problem.

If you want a nice read on the topic of caching values using only &self, last week's This Week in Rust featured this walkthrough that seems to address exactly your use case: Caches in Rust by @matklad

2 Likes

Practically, just use once_cell crate -- despite providing comprehensive API, it's rather lean at compile time and doesn't incur any extra runtime overhead. If you'd rather stick to std, the best thing is probably to work on bringing once_cell there!

If you want to do it yourself, the following minimal impl should work (absence of bugs is not guaranteed, it also a bit needlessly conservative wrt dropcheck):

use std::{cell::UnsafeCell, mem::MaybeUninit};

struct OnceCell<T> {
    once: std::sync::Once,
    cell: UnsafeCell<MaybeUninit<T>>,
}

impl<T> Default for OnceCell<T> {
    fn default() -> Self {
        Self {
            once: std::sync::Once::new(),
            cell: UnsafeCell::new(MaybeUninit::uninit()),
        }
    }
}

// NB: this does need `Send` bound.
unsafe impl<T: Sync + Send> Sync for OnceCell<T> {}
unsafe impl<T: Send> Send for OnceCell<T> {}

impl<T> OnceCell<T> {
    pub fn get_or_init(&self, init: impl FnOnce() -> T) -> &T {
        // Tricky: once is guaranteed to deadlock or panic on re-entrancy.
        self.once.call_once(|| {
            let val = init();
            unsafe {
                (*self.cell.get()).write(val);
            }
        });
        unsafe { (*self.cell.get()).assume_init_ref() }
    }
}

impl<T> Drop for OnceCell<T> {
    fn drop(&mut self) {
        if self.once.is_completed() {
            unsafe { (*self.cell.get()).assume_init_drop() }
        }
    }
}
1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.