If your OS will guarantee that MAP_PRIVATE will always fault in a private copy of each page you access, and never evict these pages to be re-read later, there might be a way to do it. POSIX allows but does not guarantee this behavior, so your code will be tied to a specific OS version, rather than generic Linux/Unix.
I think you can use mmap safely in Rust, but the problem is that a safe solution will be quite unergonomic and potentially not quite zero-cost. Instead of byte slices you would have to use custom types which would expose read/write interface and use raw pointers under the hood. In other words, you would have to copy data from mmaped pages before working with it in safe code.
As far as I can tell (I’m no expert in this area), the soundness requirement for &[u8] is that every read from a given address must produce the same result over time. If a page gets evicted and then re-read from disk, the memory contents might change to reflect changes to that block of the file.
Though clearly this isn't specified by POSIX, I'm guessing the linux implementation probably maps the same pages from the block cache to different processes mmap'ing the same file. A plain write() would result in some kernel code writing to that page as well. So potentially both threads would have to be doing atomic accesses (which could be in user or kernel code, like you said) for the behavior to be defined -- this has been debated here and I think I've seen this debated in some Rust github issues as well.
If a page was evicted since last access, reread into the block cache, and remapped after a fault with different data, well, then I'm at a loss as to what that would mean for &[AtomicU8], especially as it relates to non-x86 architectures.
For a private (copy-on-write) mapping, whether or not file changes appear is left unspecified by POSIX: You need to consult the host OS's documentation. From mmap(2) - Linux manual page (man7.org):
Create a private copy-on-write mapping. Updates to the
mapping are not visible to other processes mapping the
same file, and are not carried through to the underlying
file. It is unspecified whether changes made to the file
after the mmap() call are visible in the mapped region.
The open question is what actually triggers a process-private copy of the block to be created: Is it a read by the MAP_PRIVATE process, a write by that process, a write by some other process, or something else? In the absence of official guidance from the OS, we have to assume any write to the file could appear in the mapping at any time.
My instinct is no, but I'm not confident. Basically, raw Cell isn't data-race-protected against the underlying stuff changing -- it allows it to change, but expects all the changes to go through the cell.
Certainly it's no ok to (non-atomically) write to an UnsafeCell in one thread while reading from it in another. But memmap is weird enough that I don't know exactly what the model would say about it to know whether it counts as a write that would be capable of racing.
Said otherwise, two gets in a row for Cell are going to read the same thing (assuming nothing between them) -- they can be safely moved around so long as they don't swap places in the ordering with other things affecting the same place. But that's not necessarily fine for something memmapped.
So my instinct is that the way to be sound is to use LLVM's Unordered atomic ordering, as that's the weakest possible thing that defuses the usual UB from data races. But Rust doesn't expose that one, thus Relaxed.
I don't think such exotic types are worth the trouble compared to wrappers around raw pointers. For example, with &[AtomicU8] you will not be able to efficiently read/write u32 or other bigger than byte types into a mapped memory, while with raw pointers it's as simple as:
ptr::write_unaligned(mmap_ptr.add(offset) as *mut u32, my_val);
The only thing I am not sure about is whether mmap reads/writes should be volatile or not.
Does it? Race conditions aside, I previously asked a question related to making some C FFI-wrapping code sound, and the conclusion there was that interior mutability makes the compiler assume that something else might have changed the underlying value between two reads. Otherwise, as it seems to me, it would be borderline impossible to soundly wrap C APIs that expose raw pointers. (Of course this still doesn't license anyone to mess with thread safety.)
Hmm, looks like I lost the word "conceptually" in my drafting.
I don't know how much -- if at all -- Cell ends up changing compared to UnsafeCell. Perhaps it technically doesn't end up changing anything, as certainly reads through an UnsafeCell might have been changed by something else. But perhaps practically it's infeasible to actually provide a safe-and-sound interface where it might change underneath a cell, since there's no coordination required for reading it.
My sketch would be something like this:
if the C code promises not to change it when it's not running, then you can use &T while the Rust code is running, no need for Cell.
if the C code is allowed to change it, then &Cell<T> seems insufficient as it doesn't guard against data races against those changes.
so in neither case is Cell<T> the right mechanism.
(But I'm no Ralf, of course. UCG probably has to figure something out here, and it might be unanswerable until there's a memory model.)
Well, if we have a &Cell<T>, we can't be sure that the wrapped value won't be changed via another shared reference to the same Cell, right? That's the whole point of shared mutability, after all. How is the FFI code different?