Mmap file as &[volatile u8]?

Correct me if my terminology is wrong: assuming x86_64 u8 writes are atomic, there is no data race.

  1. we either read the old value or the new value

  2. both of these are "consistent", in that we can achieve them via "A finishes before B starts" or "A starts after B finishes"

  3. to have a data race, we need some junk value that can only be achieved by interleaving the two threads (but also not achieved by having one thread finish before the other starts)

Data races are something the language defines abstractly, and how they translate to the hardware is generally irrelevant for whether something is a data race or not. The language does not really define what it means to have a cross-process data race in the first place, so discussing whether one happens is not meaningful unless you are discussing how to change the language guarantees to describe it.

In general, Rust volatile is defined to be used in situations such as when a memory location is treated specially by the CPU so that writing or reading from it is interpreted as an arbitrary IO operation by the CPU (e.g. maybe it turns on an LED). Interacting with memory that changes due to effects from outside the current process seem like it fits that description quite well.

Sorry, I don't understand what you are supporting.

Are you stating the above described attempt as (1) definitely UB, (2) definitely not UB, or (3) probably not something we can derive from Rust spec ?

I didn't read every post in the thread, but I am arguing that using volatile operations are a better bet than atomic operations.

So the crux of the issue boils down to:

  1. Rust mmaps a file, gets a x: *mut u8
  2. Rust casts this to a y: * AtomicU8
  3. Rust does y[idx].load(std::sync::Ordering::Relaxed)
  4. At the same time, some C program, with the same file mmaped, writes a u8 to the same location.

Question: on x86_64, do we now have UB. Why or why not?

Strictly speaking, Rust doesn't have any opinion on whether that is UB because it has no opinion on cross-process mmap'ed memory at all, but if that were two threads in the same process it would be UB because the write is not atomic.

The problem with mmap is not just that another process may concurrently write to it, but also that another process may concurrently truncate the file which unmaps the respective part of the file from your address space. If this happens any memory access in the now unmapped range is UB.

Edit: full C reproducer within a single process. should work the same when the second ftruncate is split into a different process.

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>

int main(void) {
    int file = open("/tmp/foo.txt", O_CREAT | O_TRUNC | O_RDWR, 00770);
    if (file < 0) {
        perror("file create failed");
    }
    if (ftruncate(file, 16) < 0) {
        perror("grow failed");
    }
    char *mapped = mmap(0, 16, PROT_READ, MAP_PRIVATE, file, 0);
    if (!mapped) {
        perror("mmap failed");
    }
    printf("%p\n", mapped);
    printf("%d\n", mapped[2]); // Fine
    if (ftruncate(file, 0) < 0) {
        perror("truncate failed");
    }
    printf("%d\n", mapped[2]); // Crash
}
2 Likes

Uh-oh. :frowning_face:

How do you make the argument that x86_64 u8 writes are not atomic ?

This is a rather useless approach. There are always other processes on a real modern system. If your position is taken to its logical end, literally everything is UB because other processes access memory and Rust has no concept of processes.

At some point we are forced to fill the missing parts of the specification with the real-world knowledge, and the compiler writers have no choice but to accept it and work around it. Any operation may have arbitrary side effects, unless the spec explicitly says otherwise.

No, I don't think it's useless at all. What I am arguing is that volatile is the tool that fills out those missing parts best, as opposed to atomics.

This only applies to rust code. If your exclusively calling into 3rd party code then it depends on independent rules. (Your risk is the compile/link making mistakes.)

[can be] Converted into SIMD instructions and not atomic.

(This is just elaboration for anyone not aware rather than what I expect other posters already know.)
UB is part of the specification. As a conscientious developer who want future proof code and not just appears to work now. Don't write UB code; easier said that done when having to use unsafe.

The basic primitive you're looking for is memmap2::MmapRaw. With that, you can implement your own wrapper:

use memmap2::MmapRaw;

pub struct VolatileMmapSlice(MmapRaw);

impl VolatileMmapSlice {
    pub fn new(map: MmapRaw) -> VolatileMmapSlice {
        VolatileMmapSlice(map)
    }

    pub fn into_inner(self) -> MmapRaw {
        self.0
    }

    // SAFETY: index must be less than both isize::MAX and the length of the underlying map
    unsafe fn get_ptr(&self, index: usize) -> *const u8 {
        self.0.as_ptr().add(index)
    }

    // SAFETY: index must be less than both isize::MAX and the length of the underlying map
    unsafe fn get_mut_ptr(&self, index: usize) -> *mut u8 {
        self.0.as_mut_ptr().add(index)
    }

    // SAFETY: the file must not be resized while this function is called
    pub unsafe fn get(&self, n: usize) -> u8 {
        assert!(n < isize::MAX as usize && n < self.0.len());
        self.get_ptr(n).read_volatile()
    }

    // SAFETY: the file must not be resized while this function is called
    pub unsafe fn set(&self, n: usize, v: u8) {
        assert!(n < isize::MAX as usize && n < self.0.len());
        self.get_mut_ptr(n).write_volatile(v)
    }

    // SAFETY: the file must not be resized while this function is called
    pub unsafe fn copy_to(&self, start: usize, len: usize, dst: &mut [u8]) {
        assert!(dst.len() == len);
        let end = start.checked_add(len).unwrap();
        assert!(end < isize::MAX as usize && end < self.0.len());
        for (i, b) in dst.iter_mut().enumerate() {
            *b = self.get_ptr(start + i).read_volatile();
        }
    }

    // SAFETY: the file must not be resized while this function is called
    pub unsafe fn copy_from(&self, start: usize, len: usize, src: &[u8]) {
        assert!(src.len() == len);
        let end = start.checked_add(len).unwrap();
        assert!(end < isize::MAX as usize && end < self.0.len());
        for (i, b) in src.iter().enumerate() {
            self.get_mut_ptr(start + i).write_volatile(*b);
        }
    }
}

Note the loops in copy_to and copy_from. LLVM unrolls them to access multiple bytes per iteration, but the accesses are always done one at a time. Currently, in stable Rust, there's no way to do a volatile read or write of an unsized value. In unstable Rust, we can use the intrinsic directly:

#![feature(core_intrinsics)]

use std::intrinsics;

impl VolatileMmapSlice {
    // SAFETY: the file must not be resized while this function is called
    pub unsafe fn copy_to(&self, start: usize, len: usize, dst: &mut [u8]) {
        assert!(dst.len() == len);
        let end = start.checked_add(len).unwrap();
        assert!(end < isize::MAX as usize && end < self.0.len());
        let src = self.get_ptr(start);
        intrinsics::volatile_copy_nonoverlapping_memory(dst.as_mut_ptr(), src, len);
    }

    // SAFETY: the file must not be resized while this function is called
    pub unsafe fn copy_from(&self, start: usize, len: usize, src: &[u8]) {
        assert!(src.len() == len);
        let end = start.checked_add(len).unwrap();
        assert!(end < isize::MAX as usize && end < self.0.len());
        let dst = self.get_mut_ptr(start);
        intrinsics::volatile_copy_nonoverlapping_memory(dst, src.as_ptr(), len);
    }
}

Some optimization is also likely possible in stable Rust by manually batching the accesses into 64-bit or 128-bit chunks, but at that point I'd need some real profiling.

1 Like

Strictly speaking, UB is the result of any operation which is not governed by the specification: Any conforming implementation may then choose to treat that operation however it wants to. Particular implementations can then document these choices, creating a dialect that still supports all programs written against the primary specification.

In particular, the specification does not have to explicitly state that something is 'undefined' for it to be UB; not mentioning the topic at all also implies UB.

1 Like

I don't think anyone has linked this yet, where several of us had a very long discussion about this very topic: How unsafe is mmap?

2 Likes

As far as I understand, the only effect of volatile is that at the compiler level the operation is assumed to have arbitrary side effects. So the compiler won't remove the reads/writes, but what else can happen? I guess they can be arbitrarily reordered with non-volatile reads/writes by the compiler. Ok, we'll only use volatile accesses in the source. But the processor can also reorder accesses, and it has no concept of volatile accesses AFAIK. What kind of extra protections are required in the presence of multiple threads or processes?

Unfortunately, there isn't really a formalized memory model for mmap, so I can't say much more except that you shouldn't do weird things.

If there is no risk of a data race - your process is the only one that has access to the file, or all processes only read it - then you can use a normal slice reference.

If other processes can do anything to the file, then using mmap is not safe - another process can truncate() the file and then you have a SIGSEGV on reading past the new file size which cannot be prevented (it's not possible check the file size before accessing the memory as that wouldn't prevent the issue due to TOCTOU).

As long as you do not make just plain &[u8] you're fine and UB gang wouldn't attack you.
Just use plain pointers and expose API as you described.

If you want to avoid minor issues with someone deleting your file you should put lock onto it by opening it on windows for example (not sure how to do it in posix way, but there is probably way)

It's easy to think about multiple threads as a stream of instructions that can be interleaved, but that's only because of data races being UB. When you're talking about data races themself you need to understand that instructions can be executed at the same time. Processor can have multiple cores.

UB is undefined behaviour, it isn't "something will definitely go wrong". UB can be anything, and that includes working with no problem. Your program may work now, but it may not work in the future. Compilers change, and might do new optimizations you weren't expecting.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.