Is there no safe way to use mmap in Rust?

anon80458984 · January 14, 2022, 2:31pm

This is a followup to

I still am not sure how to safely use mmap in Rust. While researching this, I stumbled across this other thread:

How unsafe is mmap?

So now I am curious: is mmap fundamentally unsafe in Rust on Linux ?

The argument would go something like this:

Rust expects a &[u8] to not change behind its back; doing so leads to UB
On Linux, locking a file is only a suggestion, not actually enforced. Therefore, someone else can change a mmap-ed memory region behind your back.

alice · January 14, 2022, 2:53pm

I think the main question here is whether a race with another process in an mmapped region is UB or not. If it is, I don't think you can safely used an mmap at all.

2e71828 · January 14, 2022, 2:54pm

If your OS will guarantee that MAP_PRIVATE will always fault in a private copy of each page you access, and never evict these pages to be re-read later, there might be a way to do it. POSIX allows but does not guarantee this behavior, so your code will be tied to a specific OS version, rather than generic Linux/Unix.

anon80458984 · January 14, 2022, 3:01pm

Why is this requirement important? In my mental model, if a page gets evicted, the following happens:

read triggers a page fault
OS swaps in the page
read continues
this might be a different OS page, but we can't tell the difference in user land from virtual memory addressing

Why is this a problem ?

newpavlov · January 14, 2022, 3:30pm

I think you can use mmap safely in Rust, but the problem is that a safe solution will be quite unergonomic and potentially not quite zero-cost. Instead of byte slices you would have to use custom types which would expose read/write interface and use raw pointers under the hood. In other words, you would have to copy data from mmaped pages before working with it in safe code.

scottmcm · January 14, 2022, 3:31pm

It might be that memmapping it as &[u8] is UB. But maybe it could be done as &[AtomicU8] -- where you'd do Relaxed reads to avoid the UB with minimal perf cost.

chrisd · January 14, 2022, 3:35pm

Would that work? Doesn't the OS also need to do Relaxed stores? On some platforms this won't matter but on others it might, no?

2e71828 · January 14, 2022, 3:51pm

As far as I can tell (I’m no expert in this area), the soundness requirement for &[u8] is that every read from a given address must produce the same result over time. If a page gets evicted and then re-read from disk, the memory contents might change to reflect changes to that block of the file.

alice · January 14, 2022, 5:33pm

The page getting evicted is not necessary for the value to change under your feet. I believe any change to the file by another process would be immediately visible in the mmap.

jessa0 · January 14, 2022, 5:54pm

This is really making me think..

Though clearly this isn't specified by POSIX, I'm guessing the linux implementation probably maps the same pages from the block cache to different processes mmap'ing the same file. A plain write() would result in some kernel code writing to that page as well. So potentially both threads would have to be doing atomic accesses (which could be in user or kernel code, like you said) for the behavior to be defined -- this has been debated here and I think I've seen this debated in some Rust github issues as well.

If a page was evicted since last access, reread into the block cache, and remapped after a fault with different data, well, then I'm at a loss as to what that would mean for &[AtomicU8], especially as it relates to non-x86 architectures.

H2CO3 · January 14, 2022, 5:55pm

Or since u8 is Copy, could &[Cell<u8>] work?

2e71828 · January 14, 2022, 5:55pm

For a private (copy-on-write) mapping, whether or not file changes appear is left unspecified by POSIX: You need to consult the host OS's documentation. From mmap(2) - Linux manual page (man7.org):

       MAP_PRIVATE
              Create a private copy-on-write mapping.  Updates to the
              mapping are not visible to other processes mapping the
              same file, and are not carried through to the underlying
              file.  It is unspecified whether changes made to the file
              after the mmap() call are visible in the mapped region.

The open question is what actually triggers a process-private copy of the block to be created: Is it a read by the MAP_PRIVATE process, a write by that process, a write by some other process, or something else? In the absence of official guidance from the OS, we have to assume any write to the file could appear in the mapping at any time.

scottmcm · January 14, 2022, 6:22pm

My instinct is no, but I'm not confident. Basically, raw Cell isn't data-race-protected against the underlying stuff changing -- it allows it to change, but expects all the changes to go through the cell.

Certainly it's no ok to (non-atomically) write to an UnsafeCell in one thread while reading from it in another. But memmap is weird enough that I don't know exactly what the model would say about it to know whether it counts as a write that would be capable of racing.

Said otherwise, two gets in a row for Cell are going to read the same thing (assuming nothing between them) -- they can be safely moved around so long as they don't swap places in the ordering with other things affecting the same place. But that's not necessarily fine for something memmapped.

So my instinct is that the way to be sound is to use LLVM's Unordered atomic ordering, as that's the weakest possible thing that defuses the usual UB from data races. But Rust doesn't expose that one, thus Relaxed.

newpavlov · January 14, 2022, 6:49pm

I don't think such exotic types are worth the trouble compared to wrappers around raw pointers. For example, with &[AtomicU8] you will not be able to efficiently read/write u32 or other bigger than byte types into a mapped memory, while with raw pointers it's as simple as:

ptr::write_unaligned(mmap_ptr.add(offset) as *mut u32, my_val);

The only thing I am not sure about is whether mmap reads/writes should be volatile or not.

scottmcm · January 14, 2022, 7:40pm

write_unaligned is a memcpy under the hood, but AFAIK that doesn't defuse data race UB either, so it might be UB too.

I have no idea if volatile would either. In C++, at least, it's widely said that it's not about multithreading. (Unlike Java, where it does have something related to that, IIRC.)

H2CO3 · January 14, 2022, 7:43pm

Does it? Race conditions aside, I previously asked a question related to making some C FFI-wrapping code sound, and the conclusion there was that interior mutability makes the compiler assume that something else might have changed the underlying value between two reads. Otherwise, as it seems to me, it would be borderline impossible to soundly wrap C APIs that expose raw pointers. (Of course this still doesn't license anyone to mess with thread safety.)

scottmcm · January 14, 2022, 7:53pm

Hmm, looks like I lost the word "conceptually" in my drafting.

I don't know how much -- if at all -- Cell ends up changing compared to UnsafeCell. Perhaps it technically doesn't end up changing anything, as certainly reads through an UnsafeCell might have been changed by something else. But perhaps practically it's infeasible to actually provide a safe-and-sound interface where it might change underneath a cell, since there's no coordination required for reading it.

My sketch would be something like this:

if the C code promises not to change it when it's not running, then you can use &T while the Rust code is running, no need for Cell.
if the C code is allowed to change it, then &Cell<T> seems insufficient as it doesn't guard against data races against those changes.
so in neither case is Cell<T> the right mechanism.

(But I'm no Ralf, of course. UCG probably has to figure something out here, and it might be unanswerable until there's a memory model.)

Cerber-Ursi · January 14, 2022, 8:24pm

Well, if we have a &Cell<T>, we can't be sure that the wrapped value won't be changed via another shared reference to the same Cell, right? That's the whole point of shared mutability, after all. How is the FFI code different?

quinedot · January 14, 2022, 9:52pm

Cell guards against data races by not being Send -- i.e. restricted to a single thread.

anon80458984 · January 15, 2022, 2:24am

To the best of my understanding, the current state of the discussion is:

memory in mmap can change behind our our backs
is there some T, where &[T] is okay with it changing behind our back; here, values considered so far as : T = u8, Cell<u8>, AtomicU8 ; i.e. we want the Rust equiv of &[volatile u8]
if such a T was found, do we keep any of the benefits of doing mmap in the first place ?

Topic		Replies	Views
Rust, mmap, x86_64 linux	4	1177	January 12, 2023
Mmap and transmute / Is there a safe way?	3	1656	January 12, 2023
How unsafe is mmap? help	92	20245	April 1, 2022
Analyze safety of this use of mmap	6	472	June 30, 2022
How unsafe is mmap, for a database type lib? help	69	1898	September 20, 2023

Is there no safe way to use mmap in Rust?

Related topics