BufReader bug with Vec::with_capacity!

Hello,

I've had a strange experience: when I try to read part of a buffer using these two methods, one works and the other doesn't!

I have a main buffer which is a Cursor :
buf_reader

use tokio::io::{AsyncReadExt, AsyncWriteExt, BufReader, BufWriter, ReadHalf, WriteHalf};
use std::io::{Cursor, Error};

and this is the two ways to read from this buffer and put it in another temporary buffer :

Method 1 :

   ...
    println!("> payload_len = {}", payload_len);
   // ----------------
    let mut pl_buf : Vec<u8> = vec![0; payload_len as usize];
   // ----------------

    println!("> buffer capacity = {}",pl_buf.capacity());
    
    buf_reader.read_exact(&mut pl_buf).await?;

    println!("> buffer len = {}", pl_buf.len());
...
> payload_len = 8073
> buffer capacity = 8073
> buffer len = 8073

buffer len = 8073 : GOOD

Method 2 :

   ...
    println!("> payload_len = {}", payload_len);
   // ----------------
    let mut pl_buf: Vec<u8> = Vec::with_capacity(payload_len as usize); 
   // ----------------

    println!("> buffer capacity = {}",pl_buf.capacity());
   
    buf_reader.read_exact(&mut pl_buf).await?;

    println!("> buffer len = {}", pl_buf.len());
...
> payload_len = 8073
> buffer capacity = 8073
> buffer len = 0

buffer len = 0 !!!! NOT GOOD

But if I set the payload_len to a value < 8064, it works :thinking:

what's strange is that when I use this method in another project, it works correctly.

let mut pl_buf : Vec<u8> = Vec::with_capacity(payload_len as usize)

do you have an explanation for this behavior ?

why it works with :

    let mut pl_buf : Vec<u8> = vec![0; payload_len as usize];

and not with :

    let mut pl_buf : Vec<u8> = Vec::with_capacity(payload_len as usize); 

thank you in advance for your advice.

I think I've found the bug :

in tokio::io::util::async_read_ext::AsyncReadExt :
read_exact uses len() to determine the size to be read
read_buf use capacity() to determine the size to be read

so if my theory is correct :

read_exact does not work with a buffer created with

Vec::with_capacity

since len() = 0 because the vec has not been filled.

Yes. The read_exact method takes a &mut [u8], and when you create a slice from a vector, you are using its length.

3 Likes

It would be good to add a note to the documentation, as this is a very difficult error to resolve.

@alice Thank you for confirming my conclusion. :pray:

This doesn't make any sense. The Method 2 code can't change the length of the Vec.

The correct way to use with_capacity is with read_to_end.

buf_reader.take(payload_len).read_to_end(&mut pl_buf).await?;

No, this is literally obvious, because it can't work in any other way. You can't get a slice from a Vec that is longer than the Vec. The unused, uninitialized capacity can't be converted to reference-to-slice-of-elements, because it would be instant undefined behavior.

As this isn't specific to Read, this kind of information doesn't belong in Read's documentation.

It can't be obvious because this gets confused all, the, time. It's definitely less obvious than most things in Rust.

This is a fundamental misunderstanding with the language in general, then. (And I don't think there's anything we can do about it, this one is on people to understand what Vec is and how the capacity is different from the length.)

If we were to add this unrelated thing to Read's documentation, we would literally need to start adding complete tutorials to all types' and traits' documentation, because you can always find someone who discovers some misunderstanding while using one particular type in one particular use case.

It simply does not work like that.

for this limit ( 8064 ) I think it's because I created a buffer on a cursor.
let mut buf_writer = BufWriter::new(Cursor::new(&mut buffer));
by removing the cursor I no longer have this limit, I'll investigate further to find out the internal reason.

buf_reader.take(payload_len).read_to_end(&mut pl_buf).await?;

thank you for your suggestion, is this way better than :

buf_reader.read_buf(...) ?

what you say makes sense, thank you

but Vec::with_capacity normally allocates memory space, so logic dictates that read fills this space based on the space allocated and not the len() declared. :thinking:

ex :

Vec: | x | x | _ | _ |
       0   1   2   3

capacity = 4
len = 2

so read must be based more on capacity than len ( that's just my opinion )

No, logic doesn't "dictate" that.

Again, read() takes a slice. A slice can't be used to expand the underlying vector's length. Slices have a fixed size, even mutable slices, and they know nothing about the underlying collection that was used to create them. A slice that you get from a vector is no different from a slice that you get from a statically-sized array.

3 Likes

I see, I'll take a look at the source code of these methods to better understand how they work (especially read_buf).
Thanks again

Note that you don't, strictly speaking, need to look at the implementation of these methods. All of the issues explained above can be inferred purely from the signature (i.e., the "header"/declaration) of the function.

Although you are right that it definitely is a good idea to read the source code of the standard library, because it can be educational — both in terms of understanding mechanisms of certain library features, and because it can teach you idiomatic Rust.

1 Like

This only does one read, while read_exact and read_to_end will do as many reads as necessary to reach the end.

1 Like

I'm happy to accept a documentation bug or PR to Tokio to add a note about this to the documentation.

3 Likes

Quick question : take in AsyncReadExt , shouldn't it have "usize" as a parameter? to be compatible with 32- and 64-bit architectures?

 fn take(self, limit: u64) -> Take<Self>
        where
            Self: Sized,
        {
            take(self, limit)
        }

Many thanks for all your hard work in maintaining the "tokio" crate :pray:

Other way round - it should have u64 as a parameter to be compatible with both 32- and 64-bit architectures.

Even 32 bit machines can handle files greater than 4 GiB, and thus need take's argument to be 64-bit. You only want to switch to 32 bit integers if you're looking at memory - and even then, only because it's simpler to have a 32 bit usize than to have a u64 whose top 32 bits must be zero.

1 Like

We learn every day with the Rust community, thank you @farnz for these clarifications. :+1:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.