Is set_len the way to make an unintialized vec as a buffer?

Hi so I need a buffer of length that is not known at compile time.

I have been using vec![u8;n]; but that obviously initializes every byte, which may seems pointless as I'm going to fill it anyway.

is the following code considered a viable way to write this? Or will the optimizer fix the wasted initialize anyway?

let mut buf = Vec::with_capacity(n);
unsafe { buf.set_len(n)}
reader.read_exact(&mut buf);

This sounds a lot like the use case behind Read::initializer(). You may be able to find a stable workaround by reading through the related issues, I think the code shown in this comment is headed in the right direction.

That said, I'm feel like your use of set_len() here is unsound. When calling read_exact() you're constructing a &mut [u8] to uninitialized data, which is a bit sketchy. "What The Hardware Does" is not What Your Program Does: Uninitialized Memory might be able to explain this better than I can.

6 Likes

You must initialize the data before calling set_len.

vec![0; n] compiles to calloc, which often eliminates the expense of initialization. I recommend using this by default, and switching to unsafe code only if profiling justifies it.

7 Likes

I agree with @mbrubeck, but to answer your question:

The best way to handle this is to not call set_len, and to just use as_raw_ptr to access the potentially-uninitialized parts.

A good example of this is in the standard library implementation of mergesort.

The advantage of doing it this way is that (1) it's well-established that taking a raw pointer to uninitialized memory is fine, but taking a &mut T to an uninitialized T might not be (2) if something panics, you want to make sure you don't accidentally drop uninitialized memory.

Calling .set_len() causes rust to assume that every element in the Vec is initialized. As such, you break contract since you later read into it, dropping your undefined values. I'm 90% sure that dropping any x: T where x = <uninitialized memory> is undefined behaviour. Even for u8.

2 Likes

An inconvenient follow-up question: how does as_mut_ptr() ensure that the resulting pointer can be used for accessing every element?

The context is: there has been some recent discussion about pointer provenance tracking in the compiler/miri and how it would require a correct implementation of methods like as(_mut)_ptr() to first slice the collection (to its full range), then cast the resulting fat reference-to-slice to a raw pointer. However, this would be illegal if the data was uninitialized – meaning that there could be no way to write to an uninitialized buffer without invoking UB if provenance tracking were to be implemented in the compiler?

1 Like

Presumably the vector internally contains a raw pointer it originally got from the allocator, so if it just returned that pointer, surely it should have sufficient provenance to access the whole slice. As I understand it, the issue is when using &vec[0] as *const u8 instead of as_mut_ptr as the provenance of the entire slice was lost on conversion to reference.

4 Likes

One possible approach would be something like this:

use std::mem;
pub fn as_buffer<T>(v: &mut Vec<T>) -> &mut [mem::MaybeUninit<T>]
{
    // SAFETY: The Vec guarantees that the memory is allocated, and we're
    // providing MaybeUninit so we don't need to care whether it's initialized.
    unsafe {
        std::slice::from_raw_parts_mut(v.as_mut_ptr() as _, v.capacity())
    }    
}

So then you fill that in, and only set_len once you've writen to a prefx of it.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.