A question about how memory is stored (and VLAs)

Consider this playground code: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3ccef0f04115b949576ad21dd37a604a

extern crate bytes; // 0.4.12
use bytes::{BytesMut, BufMut};

fn main() {
    let mut bm = BytesMut::with_capacity(1);
    let size0 = std::mem::size_of_val(&bm);
    
    bm.put_u8(10);
    
    bm.reserve(1); // re-allocate
    
    bm.put_u8(11);
    
    let size1 = std::mem::size_of_val(&bm);
    
    assert_eq!(size0, size1);
}

Does the size of BytesMut stay constant because the inner pointee data independently exists on the heap, whereas the BytesMut exists on the stack in an independent array of memory (unless BytesMut is boxed)? Are there any cases when the size of the structure does not stay persistent when data is added? (The only example I can think of are DSTs)

It seems that this is a definition of the DST, in fact: a type which do not enforce that its size won't change. Any non-DST will have the same size, calculated at compile-time, through the whole program.

1 Like

Yes. The allocation of a BytesMut is stores somewhere on the heap separate from the BytesMut itself. Whether or not the BytesMut is stored on the stack or heap depends on what owns it, e.g. if it is a field in another struct, then it's stored at the same place as that struct and if you put them in a Vec or Box, it'll definitely be on the heap too.

1 Like

This is basically a vec< u8 >. On the stack you have at least a pointer that points to the inner data on the heap ( that allow to resize the data ) and the size of the pointer ( sometimes others attributes but with a size know at compile time). There is no way to have dynamic size type on the stack directly, if you really need you can just reserve the maximum number of element on the stack but when you reach this maximum you can't grow anymore.
This is the same for all resizable containers (vec, list, map, set, hashmap, string, etc... )

AFAIK there are no DSTs in Rust, currently, that change size after initial construction—the size for slices and trait objects stays constant at run time, it's just not known at compile time
(if std::mem::size_of_val would return a different value every time, it would be really awkward for something like a Box to provide storage space for them)

Soon, there will be variable length stack arrays with notation similar to this"

#![feature(unsized_locals)]
let len = rand::<usize>();
let my_stack_array = &mut [usize; dyn len];

Interestingly this is already possible in C.

1 Like

Is that a question or an assertion?

That is still not dynamic-at-runtime, it has a fixed size after construction.

1 Like

I needed to double check my understanding

Alice - do you know the name of what that feature is in C so i could look it up?

In C it's called a variable-length array or a VLA for short.

(BTW I was just curious whether you are asking if this will be possible in Rust, or if you were suggesting that it is being worked on and will be added soon, because I don't know of such a proposal if there exists one. One reason I'd like to know is that I wouldn't be too keen about it, to be honest.)

1 Like

There is a RFC but it's far from finished.
About VLA, I tried to understand a bit more about how they are implemented currently in this thread, if you're curious.

1 Like

Am I the only one here who thinks of "VBA" when I see "VLA"?

Well how do you manage the resize if a push of an i32 for example is done on the stack after the push on the stack of the array but before the resize of the stack array?
A stack is just a pointer that walk through a contiguous memory like a linear allocator. So even with the VLA if you "reserve" space on the stack, you can "reserve" more after.
It make me think about C alloca but no "realloca" is possible.

1 Like

Ah, seems like I mixed up the concepts. I still need to finish my morning coffee. But yeah, initial alloc !=> DST

Let's get this straight.

Variable Length Arrays (VLA) in C live on the stack.

They are variable length in that they can be a different size every time they are declared. Their size is specified by a variable.

They are not dynamic arrays as in dynamically resizeable after they have been created.

Last I heard the Linux guys were busy removing use of VLA's from Linux for memory safety/security reasons and because it turns out they have poor performance.

VLA's are a misfeature introduced into C in C99 and subsequently discouraged in C11, no longer being a requirement for a compiler to implement them.

3 Likes

Exactly. And to add to that, they are easily and often abused.

In de facto standard bioinformatics software PhyloBayes, I had the "pleasure" of debugging a hard-to-track-down bug that resulted in the program crashing only after hours (!) of running on a large data set. Thus I had to wait hours between repros (oh yes, the joy of finding out that I forgot to run it from gdb…).

Turns out, the bug was that the author of PhyloBayes used several VLAs for reserving O(input data length) memory, which simply overflowed the stack for a nontrivially-sized input. I corrected this by moving the arrays to the heap, and lo and behold, the segfaults were gone.

Unfortunately, my PR remains unmerged and seemingly completely ignored after a year and a half, which, if anything, further demonstrates the danger of easy-to-abuse features.

4 Likes

According to nologik above Rust is about to get variable length arrays on the stack.

Why for God's sake?

Given their track record as being poor performing and dangerous, even C# requires an "unsafe" annotation to use them, why are they even considered for Rust?

Yeah, I can imagine VLA's are really bad in reality. And like @H2CO3 pointed out, it can very naively be improperly used and hard to debug if one is a beginner.

The place where I see it being really harmful is in terms of performance... If you need to store both a stack pointer and a (mutable!) usize to handle a VLA, then the moment that usize changes, it'l force a shift in the stack, am I correct? I don't know a whole lot about how memory maps works, and how they handle fragmentation of addresses, but it does seem like a nuisance to handle from the surface

1 Like

AFAIK, the main performance problem of VLAs is that they make stack layout unpredictable, which throws a big wrench into some kinds of compiler optimizations such as use of immediate stack pointer offsets in output assembly.

3 Likes