Why does putting an array in a Box cause stack overflow?

Acknowledged. Reading the source of alice's example would not suggest otherwise.

No. It does not.

The compiler ignores it. As demonstrated by alice's playground example above.

I don't know if the distinction is american vs english, but when I googled the word I got two definitions (among others):

  1. suppose to be the case, without proof
  2. begin to have (a specified quality, appearance, or extent)

It seems you are thinking of the second meaning, while the documentation is using the first meaning.

4 Likes

It tells the Rust frontend of rustc to assume that the memory is initialized. That's why rustc does not reject the program. However, the memory actually is not initialized, which the LLVM backend detects. That's why LLVM ignores it.

1 Like

No. I exactly understand it to mean:

  1. suppose to be the case, without proof

Which as you demonstrate in your playground is not what it does.

In that case all I can say is that we do indeed consider the word to have different meanings. Recall my quote

Is this an incorrect usage of the word assume?

Generally I think of the word as "proceed as if this was true", not "make this true".

2 Likes

I guess it's time to remind about the first Rust koan. That is the situation here.

OK. Thanks for the input everybody.

I guess we are done with this debate. It is what it is and I guess it ain't changing now.

I just have to remember that anytime I see 'assume_init()' it does not mean 'assume initialized' rather it means 'extract value, check whether it was previously initialized or not, proceed with expected behavior or UB accordingly'

Really, that is how one would have to explain it in a book on Rust. As has been done so well here.

Thanks again.

It does not mean "Compiler front-end and back-end, behave as if this had been initialized." It does mean "Compiler front-end, trust me that it has been initialized even though you can't see it on your own." For me as a native speaker of American English, even if I were completely new to Rust, the latter meaning would be closer to my uninformed interpretation of assume_init() than the former. For the former meaning, perhaps the imperative form declare_init() would be appropriate.

1 Like

TomP,

  1. "Compiler...behave as if this had been initialized."

  2. "Compiler ... trust me that it has been initialized even though you can't see it on your own."

I do not see any difference between those two statements.

In both cases I would expect code following "assume_init" to proceed as if the compiler thought it was initialized. It does not.

Admittedly I snipped out your distinction between front end, back end, LLVM, etc. As I said, I don't care. I'm looking the semantics of the source code. No matter how it is compiled or what it runs on.

I find it odd that we can agree on the meaning and disagree on the outcome at the same time! :slight_smile:

Oh, neat!! I did not think of the unwinding path; now I am curious about panic = "abort" allowing Box::new to be inlined :smile:

Something like that. If we ignore layout and just focus on semantics, it would be legal to implement MaybeUninit<T> as an Option<T>:

pub
struct MaybeUninit<T> /* = */ (
    Option<T>,
);

impl<T> MaybeUninit<T> {
    pub
    fn uninitialized () -> Self
    {
        Self(None)
    }

    pub
    fn write (self: &'_ mut MaybeUninit<T>, value: T) -> &'_ mut T
    {
        self.0 = Some(value);
        match &mut self.0 {
            | &mut Some(ref mut value) => value,
            | &mut None => unsafe {
                // # Safety
                //
                //   - This is truly unreachable, so it is sound
                //     to trigger UB in this branch.
                ::std::hint::unreachable_unchecked()
            },
        }
    }

    pub
    unsafe // Safety: UB to call this on uninitialized / None
    fn assume_init (self: MaybeUninit<T>) -> T
    {
        match self.0 {
            | Some(value) => value,
            | None => {
                // Usually we should handle this branch,
                // but the semantics of this function being called
                // mean that it has been asserted that this branch
                // is truly unreachable, so let's invoke UB.
                ::std::hint::unreachable_unchecked()
            },
        }
    }
}

So, regardless of how one can interpret the meaning of "assume init" (human language is inherently ambiguous), we can see that if we forge an uninitialized None, we are not allowed to just .assume_init() it.

It's back to understanding Rust's interpretation of the unsafe keyword. assume_init() is only meaningful within an unsafe context, wherein it tells the part of the compiler that understands and enforces Rust semantics that Rust's requirements have been met even though that part of the compiler can't prove it on its own. That's the only reason why rustc does not flag the code as an error.

Do remember that the compiler front-end is attempting to ensure that the input requirements for the compiler's backend, LLVM, have been met. The Rust language eliminates whole classes of runtime errors by making such mistakes detectable at compile-time. Use of the unsafe keyword bypasses some of this detection process, making such mistakes detectable only at run-time. In this case the back-end input requirements have not been met, leaving LLVM free to "compile" the offending code in any quasi-random manner that its algorithms demand. No error is reported; it's only at run-time that the program does not meet the programmer's expectations.

Note that it would be incorrect to say that the program "misbehaves". In fact any behaviour is "correct", in the sense of garbage-in garbage-out. In safe code Rust protects the programmer from most such experiences (whereas C and C++ do not). That's why that first Rust koan is so relevant: the two great guards are analogous to the rustc compiler's front-end, whereas the threat lurking within the temple is the massive LLVM backend shared by so many different languages' compilers.

Edit: corrected plural possessive in second last word.

1 Like

I totally appreciate all the checking Rust does. And the Rust Koan. That is in large part why I am here spending significant time on Rust, betting my livelihood on it this year.

I don't really care what the front end vs the back end vs LLVM does. That is an implementation detail. I like to think that one day there will be an rust-gcc and rust-msvc and whatever else. I try to talk about what the language semantics mean, however they are implemented.

This particular debate seems to come down to my unhappiness at the naming of one method. In such a way the name implies it does something it does not do. Well, so be it. I have to learn to live with it because I'm sure it is not changing now.

Thanks all. Great discussion.

3 Likes

Yes.

But that is an interesting statement in the light of alice's example code:

fn main() {
    let a: u32 = unsafe { std::mem::MaybeUninit::uninit().assume_init() };
    
    if a == 0 {
        println!("Hello 1");
    }
    if a > 0 {
        println!("Hello 2");
    }
}

In that code 'a' is a u32.

'a' is uninitialized. But the request is to assume_init. Which implies that back in 'safe' land the compiler has to assume it is initialized, no matter what value it may hold.

Ergo there is no possible value of 'a' for which neither following condition is true.

Except there is. The compiler knows a third state of a u32, 'uninitialized', and does nothing. It defies it's own u32 semantics.

But that's not quite what it actually means.
It's more: The compiler may assume that you have (explicitly) initialized the value and is free to optimize the program with that assumption in "mind".

The fact that it optimizes out both println is just an implementation detail and a different compiler may behave differently.

Again, the term "compiler" does not distinguish between the two guards – the rustc front-end – and the threat within the temple – the highly-optimizing multi-language LLVM back-end. assume_init() is not a request to the complete compiler chain; it is a declaration to the rustc front-end that the language's semantic requirements have been met, even though the front-end code can't prove that on its own.

Since the LLVM back-end is multi-language, fed by a single-assignment intermediate representation, it is unaware of your intent; it is only aware of your code as projected by a language-specific front-end. In this case the actual code references uninitialized memory, a condition which the compiler LLVM is designed to assume is unreachable and thus should be optimized into non-existence.

Edit: Corrected reference to "the compiler" in the last sentence.

2 Likes

Phlopsi

I appreciate what you are saying there.

I'm no language/compiler expert but it seems to me that what you are asking for is impossible. At least not without significant run time overheads to do a lot of checking and kill performance. As in Java, C# etc. Which is not acceptable for a systems programming language.

I assume it's not possible because greater minds have been working on this correctness problem for 50 years and more. So far the best we have come up with is Ada and now Rust.

I'm more inclined to praise this very amazing advancement in the art of compiled language design and construction rather than knock it for it's shortcomings.

1 Like

IMHO, I don't think so. In order for LLVM to optimize the code by removing it, it has to be aware of it. The time cost of finding the error MUST be smaller than compiling the release build (all optimizations enabled). In comparison, the whole point of Rust's ownership model is to catch code that'd normally be UB at compile time instead of runtime. I'm using Rust, because it covers a huge range of protection from UB and the most dangerous kind at that. However, there are still a ton of UB errors out there that can either never be found or are infinitely difficult to prove being UB. This one seems to be provable and quick enough for LLVM to include the code, that can prove it, so why not make full use of that information?

You're making a pretty classic mistake about the nature of undefined behavior. UB is not about the compiler twirling its moustache and saying, "Ha ha, you have made an error! As punishment, I shall delete all your code!" The compiler, during optimization, doesn't know that you have actually triggered undefined behavior; it only knows that you might have. And in order to give you the best possible result, it gives you the benefit of the doubt — assuming that you didn't. This blog post is a better explanation. LLVM cannot detect undefined behavior because undefined behavior is exactly what it assumes never happens.

7 Likes

It may be important to realize that assume_init() is not a compiler intrinsic at all, just a library construct. MaybeUninit holds its value in a ManuallyDrop with extra care about how/when to allow access to that value. When you call assume_init(), that calls ManuallyDrop::into_inner(), which just moves the value out. There's no special indication to the compiler about this.

I think the sematics you were hoping for would require the freeze instruction, new in LLVM 10.

3 Likes

Nope, that wouldn't work because what if the array creation is effectful? (say in the case of [f(), g(), h(), ...]). Then it would likely be illegal for LLVM to do the following transformation

let array = [f(), g(), h(), ...];
let ptr = alloc();
if ptr.is_null() { abort() }
ptr.write(array);
// to
let ptr = alloc();
if ptr.is_null() { abort() }
let array = [f(), g(), h(), ...];
ptr.write(array);

because there would be effects that weren't visible before abort (for example, let's say closing a file).

However, if you create the allocation before initialization, then you are already in the second case! So there is almost nothing more for LLVM to optimize.

(also note that allocation failure is an abort right now, for all std types)

1 Like