Help understanding unsafe code with SyncUnsafeCell

Hello! I'm trying to understand what are the potential ways I can cause UB in the code below.

Assume I can guarantee this code runs on a single thread and I don't need to worry about interrupts and reentrancy. I tried getting two &mut HashMap<u32, String> in add_job and mutating them in different ways to cause UB, but no luck. I ran miri and also had no luck.

Would it be sound if I get an &mut HashMap<u32, String in the beginning of a function, mutate the HashMap at will and then, get a raw pointer from it and put it back inside the state? Is the last step unsafe { &mut *STATE.get() }.jobs = jobs even needed?

This is my first time trying to make use of unsafe. It's kinda scary and I think there's something I'm fundamentally misunderstanding and would appreciate some help. Thanks!

#![feature(sync_unsafe_cell)]

use std::boxed::Box;
use std::cell::SyncUnsafeCell;
use std::collections::HashMap;

static STATE: SyncUnsafeCell<State> = SyncUnsafeCell::new(State {
    jobs: std::ptr::null_mut(),
});

unsafe impl Sync for State {}

#[derive(Debug)]
struct State {
    jobs: *mut HashMap<u32, String>,
}

fn add_job() {
    // Get the jobs from the state
    let jobs = unsafe { &mut *(*STATE.get()).jobs };

    // Update jobs at will
    jobs.insert(1, "job1".to_string());

    // Update the state
    unsafe { &mut *STATE.get() }.jobs = jobs;
}

fn main() {
    {
        // Initialize the state
        let state = unsafe { &mut *STATE.get() };
        (*state).jobs = Box::into_raw(Box::new(HashMap::new()));
    }

    add_job();
}

(Playground)

See my additions to main,[1] where I hold onto a &mut _ across a call to something else that gets a &mut _. Then two unrelated &mut _ to the same resource exist at the same time (at different locations on the stack).

Like so?

Not if you don't want it to point elsewhere.


  1. you can run Miri under Tools, top-right; probably you knew that ↩ī¸Ž

4 Likes

Thanks, this was very helpful. I didn't know you could run miri in the playground and I was making a really really stupid mistake locally: running cargo run miri instead of cargo miri run :man_facepalming: and that was driving me crazy.

I was under the impression that in case the HashMap needs to grow beyond its capacity or something similar, it could be moved to a different address and the raw pointer in State.jobs would become invalid. I guess this would only concern pointers to the inner contents of the HashMap, not the HashMap itself, or if I unsafely dropped the referent, right?

Mostly right. The HashMap can change its inline values. For example, it could reallocate and change a pointer to the heap that it holds. So if you had a HashMap value in global memory, moved (unsafely copied) it to the stack, and called methods on it, you would have to write the HashMap value back to global memory afterwards.

But if you never move the HashMap value, you don't have to do that.

So one scenario is, your HashMap is in global memory, and you only turn a pointer to the global memory into a &mut HashMap. The HashMap may change the global memory storing its value when you call its methods.

Now, you've even added a layer of indirection compared to everything I just wrote: you have a pointer to a HashMap in global memory, and the HashMap value is on the heap. (That's what the Box dance did.) But you're still not moving the HashMap value around, you're just creating pointers/references to the heap location holding the HashMap.

Perhaps try diagramming everything out. It may be easier if you diagram a Vec (pointer, capacity, length) instead of a HashMap; and the general ideas will be the same. Here's a starting point.

Another way to think about it: if you call a function or method that takes a &HashMap or &mut HashMap, there still has to be a HashMap behind the reference when the function/method returns.

1 Like

Thanks a lot! This has been very helpful.

I thought about what you said and I think I was able to remove this added layer of indirection while doing the same thing. Would you agree this is better than the initial code I posted?

#![feature(sync_unsafe_cell)]

use std::cell::SyncUnsafeCell;
use std::collections::HashMap;
use std::mem::MaybeUninit;

static STATE: SyncUnsafeCell<MaybeUninit<State>> = SyncUnsafeCell::new(MaybeUninit::uninit());

struct State {
    jobs: HashMap<u32, String>,
}

fn add_job() {
    let state = unsafe { (*STATE.get()).assume_init_mut() };
    state.jobs.insert(1, "job1".to_string());
}

fn main() {
    {
        // Initialize the state
        let state = unsafe { &mut *STATE.get() };
        (*state).write(State {
            jobs: HashMap::new(),
        });
    }

    add_job();
}

(Playground)

I would agree.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.