Why we need to init BorrowedBuf before reading data

When I read the default_read_to_end source code:

pub(crate) fn default_read_to_end<R: Read + ?Sized>(r: &mut R, buf: &mut Vec<u8>) -> Result<usize> {
    let start_len = buf.len();
    let start_cap = buf.capacity();

    let mut initialized = 0; // Extra initialized bytes from previous loop iteration
    loop {
        if buf.len() == buf.capacity() {
            buf.reserve(32); // buf is full, need more space
        }

        let mut read_buf: BorrowedBuf<'_> = buf.spare_capacity_mut().into();

        // SAFETY: These bytes were initialized but not filled in the previous loop
        unsafe {
            read_buf.set_init(initialized);
        }

        let mut cursor = read_buf.unfilled();
        match r.read_buf(cursor.reborrow()) {
            Ok(_) => (),
            Err(e) if e.kind() == ErrorKind::Interrupted => continue,
            Err(e) => return Err(e),
        }

        if cursor.written() == 0 {
            return Ok(buf.len() - start_len);
        }

        // store how much was initialized but not filled
        initialized = cursor.init_ref().len();

        // SAFETY: BorrowedBuf's invariants mean this much memory is initialized.
        unsafe {
            let new_len = read_buf.filled().len() + buf.len();
            buf.set_len(new_len);
        }

        if buf.len() == buf.capacity() && buf.capacity() == start_cap {
            // The buffer might be an exact fit. Let's read into a probe buffer
            // and see if it returns `Ok(0)`. If so, we've avoided an
            // unnecessary doubling of the capacity. But if not, append the
            // probe buffer to the primary buffer and let its capacity grow.
            let mut probe = [0u8; 32];

            loop {
                match r.read(&mut probe) {
                    Ok(0) => return Ok(buf.len() - start_len),
                    Ok(n) => {
                        buf.extend_from_slice(&probe[..n]);
                        break;
                    }
                    Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
                    Err(e) => return Err(e),
                }
            }
        }
    }
}

I didn't understand why these lines are needed:

   let mut initialized = 0;
        unsafe {
            read_buf.set_init(initialized);
        }
   initialized = cursor.init_ref().len();

and what does it mean to initialize the buffer?
I also noticed that after reading, the list from init_ref() is reset

BorrowedBuf::set_init is like Vec::set_len: it's unsafely updating a bit of metadata about the buffer, not changing anything about it.

BorrowedBuf is a view into the actual buf: &mut Vec<u8> which gets recreated each iteration of the loop, so we keep track of the extra bonus initialized values after the actual length of the vector. This allows skipping duplicated reinitialization of the slack space in the vector. The bytes need to be initialized (automatically as needed by the BorrowedBuf) because giving out &mut [u8] to uninitialized is unsound at best (and potentially immediate Undefined Behavior).

You can read more about how the type works in its docs:

3 Likes

It's an UB to read value which is not initialized. Rust tried to do something sensible, e.g. it may crash your program if you would do that:

fn to_be_or_not_to_be() -> bool {
    let answer: MaybeUninit<i32> = MaybeUninit::uninit();
    let be: i32 = unsafe { answer.assume_init() };
    be == 0 || be != 0
}

pub fn main() {
    println!("To be or not to be:");
    println!("{}", to_be_or_not_to_be()); // <-- crash here
    println!("Got it?");
}

I guess it's better than what C++ usually does:

bool to_be_or_not_to_be() {
    int be;
    return be == 0 || be != 0;
}

int main() {
    printf("To be or not to be:\n");
    printf("%d\n", to_be_or_not_to_be()); // to be or not to be is… false?
    printf("Got it?\n");
}

But that's not guaranteed! Rust can do what C++ did, too, it's just harder to organize.

And initialized keeps track of what's initialized in that buffer.

2 Likes

And why should we track the initialized data, is there any actual benefit from it?

Yes, the https://github.com/rust-lang/rfcs/blob/master/text/2930-read-buf.md#why-not-just-initialize describes some. Better benchmarks, and

The ReadBuf type manages a progressively initialized buffer of bytes. It is primarily used to avoid buffer initialization overhead when working with types implementing the Read trait. It wraps a buffer of possibly-uninitialized bytes and tracks how much of the buffer has been initialized and how much of the buffer has been filled. Tracking the set of initialized bytes allows initialization costs to only be paid once, even if the buffer is used repeatedly in a loop.

Thank you!,Does this mean that it only affects optimization?

No. Safety also matters. A catastrophic example of just assuming the buffer is fully initialised instead of using the strategy of tracking is shown in that RFC too.

Uninitialized memory is a dangerous (and often misunderstood) beast. Uninitialized memory does not have an arbitrary value; it actually has an undefined value. Undefined values can very quickly turn into undefined behavior.

I read the article , but I don't fully understan,this is how I try to use it:

 let mut data =  Vec::with_capacity(4000);
        let mut read_buf:BorrowedBuf<'_> =data.spare_capacity_mut().into();
        unsafe {
            read_buf.set_init(3);
        }
        let mut read_buf = read_buf.unfilled();
        dbg!(read_buf.capacity(),read_buf.init_ref());
        pt.read_buf(read_buf.reborrow())?;
        unsafe {
            dbg!(read_buf.init_ref());
            data.set_len(4);
       
        }

My code works the same way with set_init and without,what exactly set_init affects in my code

TL;DR: unsafe is not called unsafe for nothing. It's weird and dangerous world in there, under that moniker!

Your code uses unsafe. If you are using unsafe and insist on using “but it works!” argument then you would be, rightfully and correctly, kicked out of the community. Please stop.

It ensures that your code would continue to work.

In safe Rust compiler tries very hard to ensure that if your code works today then it would be working tomorrow.

In unsafe world there are no such guarantees. The code which I have shown above was working for seven years till Rust 1.65 finally broke it.

But it was always incorrect. You just can not use “but it works!” argument with unsafe Rust.

The best litmus test today is miri. Try to run your code under Miri and you would, most likely, see the difference. It detects access to the uninitialized memory decently well.

But even Miri is not guarantee of absence of problems in your code!

2 Likes

thak you for the detailed explanation!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.