Is this unsafe?

Hi all,

I have a question for you today. I know that when using the unsafe keyword I am opting out of the marvellous Rusty safety. However, I understand that sometimes we are smarter than the compiler... although it would seem that I am not.

So, the specific case is the following:

impl HeapList {

    pub fn new()->Self{        
        Self{
            first_free: 0,
            n_elements: 0,

            // We need this because the elements are do not 
            // satisfy the Copy trait requirement... although this
            // should not be considered unsafe because all the elements are
            // initialized as 'None' anyway.
            elements: unsafe {
                let mut arr: [Option<Element>; u8::MAX as usize] = std::mem::MaybeUninit::uninit().assume_init();//std::mem::uninitialized();
                for item in &mut arr[..] {
                    std::ptr::write(item, None);
                }
                arr
            },
        }
    }
} 

Now, I thought that I was solving the Undefined and unsafe behaviour right after initializing this structure, hence that it was not really unsafe. What am I missing?

Thanks in advance!

I'd recommend carefully reading through the documentation of MaybeUninit.

2 Likes

From assume_init:

Safety
It is up to the caller to guarantee that the MaybeUninit<T> really is in an initialized state. Calling this when the content is not yet fully initialized causes immediate undefined behavior.

At no point between calling uninit() and assume_init() you guarantee that the T is initialized.
The proposed solution in that linked issue would be ok, but really this is simpler:

const NONE: Option<Element> = None;
[NONE; u8::MAX as usize]

Update: for context why that works.

8 Likes

See also the main docs for:

No, this is invalid on two counts:

  • MaybeUninit::uninit().assume_init() does not give you uninitialized memory. It gives you Undefined Behavior and a license to kill to the optimizer. You can't tell the compiler to assume memory is initialized before initializing it. You can only call assume_init() after writing to the memory.

    There's only one special case when MaybeUninit::uninit().assume_init() is acceptable — that's when you have an array of MaybeUninits, like: MaybeUninit<[MaybeUninit<Element>; N]>. But in your case it's MaybeUninit<[Element; N]>, so it's an error, and Undefined Behavior.

  • &mut must not point to uninitialized memory at any time, ever, even if when it's never used. That's the reason why mem::uninitialized has been deprecated — it couldn't be used safely. Mere existence of &mut is a signal to the optimizer that the memory has already been initialized, by definition. So your &mut arr[..] is incorrect. That's why MaybeUninit operates on *mut T instead, so that it never implies to the optimizer that it's been initialized.

13 Likes

Thanks! This is the kind of explanation I was looking for. I code for solving problems, but I am not a computer scientist myself at all. I thought Undefined Behaviour happened mainly (always?) due to Uninitialized memory. Now I can go and check the difference between them.

One more thing: is this the compiler being cautious (e.g., calling assume_init() requires it being initialized)? or this code would actually crash at some point?

Thanks!

This solution is gold. This has been on my wish list for a long time. Thanks!

When you have undefined behavior, the compiler is allowed to compile your program to literally whatever it wants. In many cases, the compiler ends up emitting something reasonable, but with UB, there are no guarantees. The compiler could easily get a new optimization in the future that completely breaks your code.

For example, one thing that might happen is that the compiler might reason that since your code has undefined behavior, that piece of code must be unreachable. It could then optimize out any code-paths that call HeapList::new.

2 Likes

Oh, Ok. Those are some serious consequences. Awesome response, thanks for that

To answer a short question with a long answer...

In systems languages like C and Rust, the programmer and the compiler are trying to work together to create a functional, correct program. To do this they have agreed upon a bunch of rules (syntax, types, borrow checker, etc.) so each side knows what "correct" means. Often there are operations which just don't make sense and the programmer promises they'll never happen in a "correct" program.

We as programmers promise we'll never write code which could possibly break these rules (e.g. by always checking a pointer before dereferencing it), which lets the compiler make certain assumptions that may help create a "better" executable (e.g. it can see one branch of an if-statement would dereference a null pointer so we can skip the entire branch and check because it'll never happen).

In your case, setting aside a chunk of uninitialized memory (MaybeUninit::uninit()) and telling the compiler "hey, I promise I just initialized this" (assume_init()) is breaking the rules that you and the Rust compiler agreed to abide by (you can never observe uninitialized memory), so it is unsound and nobody knows what should happen.

Kinda like when you divide by 0 in math, the behaviour is just not defined and doesn't make sense.

2 Likes

Thanks very much.

Before this discussion, my understanding was that the program would run as I wrote it (with some optimizations, obviously) and that undefined behaviour was caused by things like reading uninitialized memory or segmentation faults. That is why I thought telling the compiler to "assume init" was not harmful as long as I knew that, when the program got to that piece of memory, it would have been initialized.

But, now I know!

2 Likes

If you have spare time and interest, what you're looking at here is the difference between "denotational semantics" and "operational semantics". There's a lot of academic literature about the two approaches that you could read.

You were thinking in terms of operational semantics; given this program, what would happen if I ran each bit of Rust code in the program on a "Rust machine" that directly executes Rust code?

The compiler, on the other hand, is working with denotational semantics; given this program, what meaning can be ascribed to each bit of Rust code?

Compiler authors tend to prefer denotational semantics, because it's easier to check that an optimization is correct in a denotational world, but operational semantics are also a valid way to describe a program.

4 Likes

Any advice on how to do this for THIS case?

There is a Generic involved and I get the can't use generic parameters from outer function error.

You can move the const NONE definition outside the function, so that it is written inside the impl (this makes it so it is no longer the constant NONE, with no generics around, but the constant <Stack<T>>::NONE, which does have access to T).
You then must use [Self::NONE; u8::MAX as usize] to construct the array.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.