Correct way to use memory mapped circular buffers

Hello, I want to make use of mmap to create a circular buffer, but I don't know how to make it sound. I compiled a couple of snippets in Rust and in C, but all got miscompiled...

As you can see, second assert got compiled out by both rustc and gcc, while that store might have been mapped by hardware to the same place as x[0]. So I am assuming that passing circular buffer to this function is undefined behavior. Is there some way to signal compiler that it can't do such an optimization? And why it is compiled like this even in C case? I thought it is a common thing in C...

pub fn foo(x: &mut [u8; 1 << 16]) {
    assert!(x[0] == 8);
    x[1 << 15] = 4;
    assert!(x[0] == 8);
}

// Making sure no references are created
pub fn bar(x: *mut [u8; 1 << 16]) {
    let x = x as *mut u8;
    unsafe {
        assert!(*x.wrapping_add(0) == 8);
        *x.wrapping_add(1 << 15) = 4;
        assert!(*x.wrapping_add(0) == 8);
    }
}
example::foo::he976409b4476718b:
        cmp     byte ptr [rdi], 8
        jne     .LBB0_2
        mov     byte ptr [rdi + 32768], 4
        ret
.LBB0_2:
        push    rax
        lea     rdi, [rip + .L__unnamed_1]
        lea     rdx, [rip + .L__unnamed_2]
        mov     esi, 27
        call    qword ptr [rip + core::panicking::panic::h5cc1e6ff3d457269@GOTPCREL]

example::bar::hc238df3a985c0821:
        cmp     byte ptr [rdi], 8
        jne     .LBB1_2
        mov     byte ptr [rdi + 32768], 4
        ret
.LBB1_2:
        push    rax
        lea     rdi, [rip + .L__unnamed_3]
        lea     rdx, [rip + .L__unnamed_4]
        mov     esi, 41
        call    qword ptr [rip + core::panicking::panic::h5cc1e6ff3d457269@GOTPCREL]

.L__unnamed_1:
        .ascii  "assertion failed: x[0] == 8"

.L__unnamed_5:
        .ascii  "/app/example.rs"

.L__unnamed_2:
        .quad   .L__unnamed_5
        .asciz  "\017\000\000\000\000\000\000\000\002\000\000\000\005\000\000"

.L__unnamed_3:
        .ascii  "assertion failed: *x.wrapping_add(0) == 8"

.L__unnamed_4:
        .quad   .L__unnamed_5
        .asciz  "\017\000\000\000\000\000\000\000\n\000\000\000\t\000\000"
#include <assert.h>
#include <stdint.h>

void square(uint8_t *array) {
    assert(array[0] == 8);
    array[1 << 15] = 4;
    assert(array[0] == 8);
}

.LC0:
        .string "/app/example.c"
.LC1:
        .string "array[0] == 8"
square:
        cmp     BYTE PTR [rdi], 8
        jne     .L7
        mov     BYTE PTR [rdi+32768], 4
        ret
.L7:
        push    rax
        mov     ecx, OFFSET FLAT:__PRETTY_FUNCTION__.0
        mov     edx, 5
        mov     esi, OFFSET FLAT:.LC0
        mov     edi, OFFSET FLAT:.LC1
        call    __assert_fail
__PRETTY_FUNCTION__.0:
        .string "square"

Mapping the same physical memory into separate virtual ranges goes against fundamental assumptions about memory used by optimizing compilers. Your best option is probably to use volatile reads/writes while accessing such memory.

4 Likes

you need volatile.

For other readers, there is a relevant internals thread: Pre-RFC: core::ptr::simulate_realloc - libs - Rust Internals

You can get the intended behaviour with compiler fencing mutable subview creation: Compiler Explorer

Although, to my understanding this is not guaranteed to work since the documentation says

Note that just like fence, synchronization still requires atomic operations to be used in both threads – it is not possible to perform synchronization entirely with fences and non-atomic operations.

This is not setting up a fitting model for the compiler (because there is none) but inhibiting the relevant optimizations for specific compiler versions.

1 Like