Is writing memory behind immutable reference sound?

I design a struct SharedMem, which provides similar abstraction like linux shared memory. Multiple threads can write/read the memory simultaneously. The code looks like below.

use std::sync::Arc;

#[derive(Clone)]
pub struct SharedMem {
    inner: Arc<Inner>,
}

struct Inner {
    addr: usize,
}

impl SharedMem {
    pub fn new() -> Self {
        let bytes = Box::new([0u8; PAGE_SIZE]);
        let addr = Box::into_raw(bytes) as usize;
        Self {
            inner: Arc::new(Inner { addr }),
        }
    }

    pub fn read_bytes(&self, offset: usize, dst: &mut [u8]) {
        assert!(offset + dst.len() <= PAGE_SIZE);

        let src = {
            let ptr = self.inner.addr as *const u8;
            ptr.wrapping_offset(offset as isize)
        };
        unsafe {
            core::ptr::copy(src, dst.as_mut_ptr(), dst.len());
        }
    }

    pub fn write_bytes(&self, offset: usize, src: &[u8]) {
        assert!(offset + src.len() <= PAGE_SIZE);

        let dst = {
            let ptr = self.inner.addr as *mut u8;
            ptr.wrapping_offset(offset as isize)
        };

        unsafe {
            core::ptr::copy(src.as_ptr(), dst, src.len());
        }
    }
}

impl Drop for Inner {
    fn drop(&mut self) {
        unsafe {
            let _ = Box::from_raw(self.addr as *mut [u8; PAGE_SIZE]);
        }
    }
}

I believe the code does not actually lead to any undefined behavior. But I still wonder whether the code is sound in two aspects.

  1. Is it sound to mutate the memory behind a shared reference without using UnsafeCell? The function write_bytes modifies the contents of memory, but it only needs an immutable reference to SharedMem. In Rust, interior mutability is a common pattern used for this purpose, such as Mutex, Cell, etc. However, those data structures that include interior mutability internally use UnsafeCell, which is not used internally in SharedMem.
  2. Is it sound that SharedMem can be Send and Sync? Although in our code, the compiler automatically implements Send+Sync for SharedMem because we store an address inside Inner. However, if we were to store a raw pointer instead of an address inside Inner, the compiler won't automatically implement these two traits. In this case, can we still implement Send+Sync for SharedMem?

no, it's instant UB.

that's the definition of interior mutability, so you must use UnsafeCell, otherwise, it's UB.

it can be if you implement it correctly with proper synchronization. Send and Sync are unsafe traits, you can mark any of your typse Send and/or Sync, as long your implementation ensures the safety contract is kept, otherwise, your implementation is unsound, by definition.

3 Likes

What drives that belief? Note that UB is defined at the language level (and not, say, what a compiler happens to emit on any given invocation).

No, that's always UB.

2 Likes

Contrary to the previous two replies, especially with your source code as context, I would say yes it can be sound; raw pointers are another primitive that can allow for some shared mutability, without any direct or indirect usage of UnsafeCell and the raw pointer tbst Box::into_raw returns does allow shared mutable accrss. I don't know why you go the extra step of converting it to usize but with or without that, your code might actually be sound, upon first glance. Another good indicator would be to run it through miri (maybe better to do that in a version that would put a *mut [u8; PAGE_SIZE] directly into Inner instead of the usize, because that extra step might make miri more likely to accept code that could still be containing UB).

Ah, I see, is that why usize is used? No, your code looks very much not thread-safe, so it's unsound, even if the compiler did implement them automatically here - that would just be due to "tricking" the help that the compiler usually gives you in the form of making raw pointers non-Send/Sync by default (which in turn is the case for the very reason they offer shared mutability with our synchronization).

1 Like

I turn the addr into raw pointer and miri does not report any error.

But I'm not clear why I can't implement Send and Sync for SharedMem? SharedMem does allow multiple threads read/write to it simultaneously. Like core::ptr:copy, this method does not require that no other read/write operation at the same time.

Yet it does.

  • dst must be valid for writes of count * size_of::<T>() bytes, and must remain valid even when src is read for count * size_of::<T>() bytes.

Linking to…

The precise rules for validity are not determined yet. The guarantees that are provided at this point are very minimal:

  • All accesses performed by functions in this module are non-atomic in the sense of atomic operations used to synchronize between threads. This means it is undefined behavior to perform two concurrent accesses to the same location from different threads unless both accesses only read from memory. Notice that this explicitly includes read_volatile and write_volatile: Volatile accesses cannot be used for inter-thread synchronization.

(added emphasis)

2 Likes

In std::file::File, the file content can be written with only immutable reference.

So I thought it may be sometimes ok to mutate the memory behind a shared reference without using UnsafeCell,

Good, (assuming you added some main function that actually calls the read and write methods in a representative way) that might be a good indicator that your code could be sound for non-concurrent use. miri tends to even be pretty good at calling out data races in concurrent use, so I wouldn’t be surprised if you could write a test case that involves spawning a thread and writing on one thread while reading from the other that gets reported as UB.


Edit: Indeed!

const PAGE_SIZE: usize = 4096;

fn main() {
    let x = SharedMem::new();

    let x_ = x.clone();
    std::thread::spawn(move || x_.write_bytes(0, &[1]));
    let mut y = [0];
    x.read_bytes(0, &mut y);
    dbg!(y);
}
error: Undefined Behavior: Data race detected between (1) non-atomic read on thread `main` and (2) non-atomic write on thread `<unnamed>` at alloc916. (2) just happened here
  --> src/main.rs:42:13
   |
42 |             core::ptr::copy(src.as_ptr(), dst, src.len());
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Data race detected between (1) non-atomic read on thread `main` and (2) non-atomic write on thread `<unnamed>` at alloc916. (2) just happened here
   |
help: and (1) occurred earlier here
  --> src/main.rs:29:13
   |
29 |             core::ptr::copy(src, dst.as_mut_ptr(), dst.len());
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
   = note: BACKTRACE (of the first span):
   = note: inside `SharedMem::write_bytes` at src/main.rs:42:13: 42:58
note: inside closure
  --> src/main.rs:61:32
   |
61 |     std::thread::spawn(move || x_.write_bytes(0, &[1]));
   |                                ^^^^^^^^^^^^^^^^^^^^^^^

Rust Playground

1 Like

your wording doesn't express your actual intention. "mutate memory behind a shared reference" is UB by definition, there's no question about it

but your code doesn't mutate the memory behind a shared reference, the shared reference does not "reference" the memory being mutated. it's like saying "I have a shared reference to a socket, I can change the internal buffer through a shared reference without UnsafeCell", you are talking different things.

6 Likes

Note that mutating the data referenced by a shared reference (and without UnsafeCell inbetween) is UB, but mutating something else (i.e. data under another level of indirection) given a shared reference is not necessarily UB (it can be, but it depends on how you implement it). Writing to a file falls under the latter case.

1 Like

Interesting! That's really amazing results.

Very thanks for your help.

Yes, but the indirection level may not be trivial to recognize, in real cases.

The reason this works isn't because it is behind an immutable reference, it's because the OS is managing the synchronisation of writes to the file descriptor so there is no possibility of data races.

4 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.