Are uninitialized allocations from FFI also "uninitialized" in Rust?

I know that this_is_unsound is unsound because it creates a reference to uninitialized memory. But does this also hold for the but_is_this_unsound_too function?

In my (naïve) opinion, Rust cannot know that the values returned by malloc are uninitialized (opposed to the case when the Rust allocator is used). Thus, from the perspective of Rust, the memory is not distinguishable from initialized memory, or am I in the wrong here?

pub fn this_is_unsound(size: usize) -> &'static mut [u8] {
    let mut v = Vec::with_capacity(size);
    // Unsound!
    // The following will allow access to uninitialized memory:
    unsafe { v.set_len(size); }
    v.leak()
}

pub fn but_is_this_unsound_too(size: usize) -> &'static mut [u8] {
    assert!(size <= isize::MAX as usize);
    // SAFETY: `libc::malloc` can always be called with any `size`
    let m = unsafe { libc::malloc(size) };
    assert!(!m.is_null());
    // SAFETY:
    //  *  data behind pointer is valid for reads and writes
    //     for `'static` as it will never be deallocated
    //  *  memory is set to arbitrary but defined values which
    //     are valid; the abstract Rust machine does not know
    //     that this memory is "uninitialized" and thus can't
    //     perform any illegal optimizations
    //  *  no other threads are expected to change the value
    //     of the bytes because a previous `free` will
    //     synchronize-with the `malloc`
    //  *  the pointer isn't used elsewhere, thus nobody else
    //     accesses the memory during the lifetime `'static`
    //  *  `size * size_of::<u8>() <= isize::MAX`
    unsafe { std::slice::from_raw_parts_mut(m.cast(), size) }
}

(Playground)

What the compiler knows is irrelevant for the question of soundness. There, the only thing that matters is what your code does. If your code does allowed stuff, then it's sound, and if it triggers UB, then it's unsound. End of story.

Now, the question of "does it get miscompiled?" is, of course, a different story. If you call into unknown C FFI code and then use the memory as-if it was initialized, then we get two cases:

  1. The FFI code initialized it. All is good, and the compiler does what you want.
  2. The FFI code didn't initialize it. Your code triggers UB, and the compiler is allowed to do anything it wants. Since the compiler doesn't know about this, it chooses to compile it into what you wanted.

So you might argue that you get what you wanted in either case. However, the second case is still unsound! After all, UB allows everything, and doing what you wanted is included in everything.


All that said, the compiler does know what malloc does, so it's fully aware that your memory is uninitialized. Even if you wrapped in a custom C FFI function, the compiler could learn of this during link-time-optimization. Only dynamic libraries are truly be unknown to the compiler.

However, you can argue that is sound in an entirely different way. See this thread for more on that.

4 Likes

Okay, I agree on that.

Okay, so I should have argued differently:

My assumption was that malloc does not return "uninitialized" memory in regard to the Rust definition of "uninitialized". But looks like the "uninitialized" state is already defined at a much lower level, e.g. in LLVM, so that would explain why malloc can return uninitialized memory also from the perspective of Rust.

Okay, understood!

I think that's formally not correct. I think for that to be (possible to be) true, I would have needed to declare the but_is_this_unsound_too function as being an unsafe fn.

Proof:

The following program compiles but causes UB because it reads from undefined memory. The only unsafe blocks are in but_is_this_unsound_too, so that function must be unsound.

pub fn but_is_this_unsound_too(size: usize) -> &'static mut [u8] {
    assert!(size <= isize::MAX as usize);
    let m = unsafe { libc::malloc(size) };
    assert!(!m.is_null());
    unsafe { std::slice::from_raw_parts_mut(m.cast(), size) }
}

fn main() {
    println!("{}", but_is_this_unsound_too(1)[0]);
}

(Playground)

Opposed to:

-pub fn but_is_this_unsound_too(size: usize) -> &'static mut [u8] {
+pub unsafe fn but_is_this_unsound_too(size: usize) -> &'static mut [u8] {

fn main() {
-    println!("{}", but_is_this_unsound_too(1)[0]);
+    println!("{}", unsafe { but_is_this_unsound_too(1)[0] });

(Playground)

where the main function is to blame can be blamed.

Right, that's true. The function is not sound because there exists safe ways to use it that triggers UB.

1 Like

Also be aware that cross language LTO allows the compiler to know about uninitialized allocations even across the FFI boundary if both sides participate in the LTO.

1 Like

What is LTO?

Link time optimization. It allows optimizing between multiple separately compiled libraries if they are linked together into a single executable or dynamic library. This allows better optimizations at the cost of taking longer to compile your code.

1 Like

Yes, I see. Well, my mistake was that I thought the state of memory being "uninitialized" (opposed to not having been set to useful values) was some property/concept being Rust specific. But as @alice commented and I also concluded later, I was wrong with that assumption:

1 Like

The property of "uninitialized" actually goes all the way down to hardware - it is possible to have a memory cell in an in-between state where the value of the bit read back varies each time you read it, and it remains in this state until you write a value to the memory cell.

All DRAM devices have a capacitor whose state of charge is read to determine if we have a 1 or a 0. Reading the state of charge drains the capacitor, so after each read, the DRAM has to recharge the capacitor if appropriate. Better DRAM designs determine whether to recharge the capacitor based on the digital output signal, and thus "fix" the cell value to a deterministic 1 or 0 on first read; but you can also recharge the capacitor to a level determined by the analogue output of the sense amplifier, in which case a non-deterministic state can persist until the cell receives a write.

For most software, this isn't a problem - if you're a userspace application on a mainstream OS, your RAM has been written to with all-zeroes before you get access to it, so the cells aren't ever in this middle state. But if you are an OS, or running without an OS, you may see this state, depending on your hardware.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.