Why does putting an array in a Box cause stack overflow?

As I understand it, this is undefined behavior because you are calling assume_init before initializing it.

Totally.

Having programmed in C usually gives people the feeling that unsafe Rust is C with Rust syntax, but Rust relies on tyoe-level invariants, called validity invariants, that cause insta-UB when broken.


Given the issue at hand, I recommend trying to use the ::copyless crate, which manages to avoid using the stack when using a Box constructor:

use ::copyless::BoxHelper;

Box::alloc().init([[0; WIDTH]; HEIGHT])
1 Like

How does this not put the argument on the stack to pass it to init?

You aren't creating a T. You are creating a Vec that coerces into a &T. If that bugs you, make another typedef with the Vec, but to be honest I would find that obscure. I wouldn't even use the typedef for T. If you want to ensure an internal invariant, just use a newtype that maintains eg. the size of the outermost dimension and impls Deref. Don't make it complicated.

I have difficulty understanding the issue here. How is "undefined behavior" defined in Rust?

In this particular case I don't see why it matters. The way I'm reading that code is:

  1. I create a Box containing a T. The 'Box' says create it on the heap. The 'T' says how big it is. The 'uninit' makes it clear the content is garbage.

  2. I use 'assume_init' to make this a thing we can use as a T. This is a dangerous situation, we now have an uninitialized thing. Inviting randomness into out code.

  3. I initialize it. Now all is good. I exit the unsafe block.

Effectively I'm saying to the compiler "See that memory you have there? See how it is full of unknown junk? Just assume I put them there and that is how I want it initialized .Thanks". I then initialize the thing and leaf the 'unsafe' block.

My understanding of UB is that the compiler, on seeing that, can do what it likes. Especially re: optimization. Having decided my code is garbage anyway so it does not matter.

I would prefer if compilers actually did what I write :slight_smile:

It's undefined behavior because Rust assumes that all values and references are correctly initialized at all times, and breaking this invariant is instant-UB. Initialize the memory first, then cast it.

Ha! That is what I spent hours trying to do! :slight_smile:

At first sight that sounds more complex. More lines of code at least.

Any chance you could show how you would do it?

An example of a newtype would be this:

pub struct Array2D<T> {
    buffer: Box<[T]>,
    width: usize,
    height: usize,
}
impl<T> Array2D<T> {
    pub fn new(init: T, width: usize, height: usize) -> Self where T: Clone {
        Array2D {
            buffer: vec![init; width*height].into(),
            width,
            height,
        }
    }
    pub fn width(&self) -> usize {
        self.width
    }
    pub fn height(&self) -> usize {
        self.height
    }
}
impl<T> Index<usize> for Array2D<T> {
    type Output = [T];
    fn index(&self, idx: usize) -> &[T] {
        let off = self.height * idx;
        &self.buffer[off .. off + self.height]
    }
}
impl<T> IndexMut<usize> for Array2D<T> {
    fn index_mut(&mut self, idx: usize) -> &mut [T] {
        let off = self.height * idx;
        &mut self.buffer[off .. off + self.height]
    }
}

playground

2 Likes

Wow, thanks. That looks like it would do nicely. It's even straightforward and simple enough I can understand it :slight_smile: This is another post for printing and hanging on the wall.

This little experiment got totally out of hand. It was intended to be a one hour effort at recreating 11 lines of C code in Rust. Then checking out for performance. It grew into whole day of research, discussion and experiment on all kind of Rust highways and byways!

Having to define ones own type in a page of code to do what I originally had in mind seems rather over the top.

I know there is a multi-dimensional array crate out there. I like to roll my sleeves up and find out how to do things myself though.

1 Like

playground

Thanks H2C03, that is short an sweet.

Unfortunately I have no idea how it works. I mean, I don't see my indices being used to index anything in Deref. Looks like have more homework to do.

Undefined behavior is a strictly defined terminology. Safe Rust is UB-free and this is why you should stay in it. Unsafe Rust is far more unsafe than C as it has way more UB conditions, to allow compilers more aggressively optimize the code.

3 Likes

I would just do this:

How about using Vec::with_capacity(10_000_000), then manually clearing via .push(0) ? That's what I would do. Pros:

  • only 1 allocation
  • no explicit unsafe
  • should be reasonably fast

In this particular case it is not UB, because primitive integer types have no drop glue. If he would be storing more complex types, it'd totally be UB, though, because the destructor would be run on something invalid when overwriting the old values. Nevertheless, the way it was written is a logic error and should be fixed. The correct way is to initialize first and then assume it is initialized, not the other way around. That said, the type should be:
MaybeUninit<[[MaybeUninit<i64>; WIDTH]; HEIGHT]>
In that case, he can safely call assume_init on the outer value, because it contains a multidimensional array of MaybeUninits, which is fine. The next step is to initialize the array and at the end, the array can be transmuted to:
[[i64; WIDTH]; HEIGHT]>

This is not how I understand the rules about UB to be. I have a snippet from the docs of MaybeUninit below:

Moreover, uninitialized memory is special in that the compiler knows that it does not have a fixed value. This makes it undefined behavior to have uninitialized data in a variable even if that variable has an integer type, which otherwise can hold any fixed bit pattern

use std::mem::{self, MaybeUninit};

let x: i32 = unsafe { mem::uninitialized() }; // undefined behavior!
// The equivalent code with `MaybeUninit<i32>`:
let x: i32 = unsafe { MaybeUninit::uninit().assume_init() }; // undefined behavior!

In particular, this is not the reason for the undefined behaviour:

a[i][j] = 0; // oops, normally this would drop the previous value

but this is:

let a = Box::<T>::new_uninit();
let mut a = a.assume_init(); // oops, T has the invariant that it is not uninitialized

Of course, a[i][j] is also UB as this creates a &mut i32 which is undefined behaviour as it points to uninitialized memory, but the damage has already been done at this point.

1 Like

I am aware of the wikipedia definition. What I was fishing for was "Undefined behavior" defined in the Rust specification as we find in the C standards for example.

Of course we can find notes on safety and the term "Undefined behavior" in the Rust docs quite often. But still it's not exactly precise. For example, as we are talking about Box we find this in the Box documentation:

... fn assume_init... " Converts to Box<[T]> ." With a note on 'safety': "it is up to the caller to guarantee that the values really are in an initialized state. Calling this when the content is not yet fully initialized causes immediate undefined behavior."

Perhaps it's just my C experience but the implications of that note are not immediately obvious. I created an allocation on memory using Box new. No notes on safety there so I assume that gets done as ordered. At that point I'm happy my array is created and initialized. All be it full of random junk. Ergo I assume I can call 'assume_init()' without UB. Given that I then write zeros all over the array and that I never read it before doing so, in my mind there is no UB there.

Sounds like I have to be careful to interpret those safety notes in a different light.

Well you used Box::new_uninit, not Box::new, and the doc says "Constructs a new box with uninitialized contents.", so it is not surprising that the contents are uninitialized.

In the eyes of the LLVM optimizer, it is not random junk. Every value has the special LLVM value known as undef. Rust has chosen that the type i32 has the invariant that it is not undef, thus it is undefined behaviour to have an i32 that is undef. This invariant doesn't care about whether you read it or not.

If the equivalent is not UB in C, that's because they have decided not to have the invariant that integers cannot be undefined.

1 Like

Perhaps I'm just a bit slow.

When the C standard says things like "dereferencing an unititialized pointer is UB" it's usually immediately obvious why that is so.

When the Rust doc says "Calling this when the content is not yet fully initialized causes immediate undefined behavior." I don't find that as clear or informative.

As I said above I will have to be careful to interpret those safety notes in a different light.

This allows Rust to insert spurious reads to the pointer (for example, pre fetching). This is because references and Box are marked dereferencable in LLVM.

A great example of how uninitialized data is not just random junk. This does not print anything on release when I wrote it, but any fixed value would print something:

fn main() {
    let a: u32 = unsafe { std::mem::MaybeUninit::uninit().assume_init() };
    
    if a == 0 {
        println!("Hello 1");
    }
    if a > 0 {
        println!("Hello 2");
    }
}

playground

(of course, the behaviour may change at any time — it's undefined behaviour)

Interestingly, it starts printing one of them if you try to print a.

8 Likes