Load to string with known length without reallocating in safe code


#1

I ran into a case where I realized that I wasn’t able to (in safe code only) load a string from a Read into an existing allocated String without allocating an intermediate buffer. But I wanted to make sure I didn’t miss something, because there is no reason for the related functionality to not exist.

Specifically, I am curious whether something like read_to_string2 can currently be implemented, but it requires String#into_vec() which repurposes the string’s buffer as a vec. I was wondering whether such functionality exists somewhere that eludes me.

fn read_to_string1(reader: &mut io::Read, s: &mut String, len: usize) -> Result<(), Error> {
    let mut bytes = vec![0; len];  // 1 guaranteed allocation
    try!(reader.read_exact(&mut bytes[..]));
    *s = try!(String::from_utf8(bytes));
    Ok(())
}

fn read_to_string2(reader: &mut io::Read, s: &mut String, len: usize) -> Result<(), Error> {
    let mut bytes = unsafe{s.as_mut_vec()}.clone(); // just to get it to compile
    //let mut bytes = s.into_vec();  // QUESTION: does this functionality exist anywhere?
    bytes.resize(len, 0);
    try!(reader.read_exact(&mut bytes[..]));
    *s = try!(String::from_utf8(bytes));
    Ok(())
}

Playground
(In this context, avoiding a “moving out of borrowed context” error, with s: &mut String, would either require String#into_vec() to not consume the string or the method above to swap the string out with a local temporary.)

This is not performance that I strictly need right now. Just a curiosity.


#2

This works:

#![feature(read_exact)]

use std::string;
use std::io;

#[derive(Debug)]
struct Error;

impl From<string::FromUtf8Error> for Error {
    fn from(_e: string::FromUtf8Error) -> Error { Error}
}

impl From<io::Error> for Error {
    fn from(_e: io::Error) -> Error { Error }
}

fn read_to_string2(reader: &mut io::Read, s: &mut String, len: usize) -> Result<(), Error> {
    let mut bytes = std::mem::replace(s, String::new()).into_bytes();
    if bytes.len() < len {
        bytes.resize(len, 0);
    }
    try!(reader.read_exact(&mut bytes[..]));
    *s = try!(String::from_utf8(bytes));
    Ok(())
}

fn main() {
    let mut buf = io::Cursor::new(vec![104u8, 101, 108, 108, 111]); // "hello"
    let mut s = String::new();
    println!("2: {:?}", read_to_string2(&mut buf, &mut s, 5));
    println!("String: {}", s);
}

Also, String#into_vec() doesn’t mean anything in Rust; I suspect you want String::into_vec.


#3

Thanks! That was the function. And I just noticed Into<Vec<u8>> as well – I could have sworn I looked for it, but I must have been looking somewhere else.


#4

regular read_to_string should work, if you do something like this:

(&mut file).take(n).read_to_string(&mut s)


#5

I had actually looked into that but discarded the option because take consumes its argument and I needed to be able to read more data. But I see you consumed &'a mut R instead which also has an impl for Read. Thanks, this is awesome and much cleaner!

That’s one of the key things I really need to get used to in the rust docs: I often overlook the impls on the references.

A couple of things to note about using take, though:

  • take takes in a u64 instead of a usize which is a little strange.
  • take appends to the string, so that the string has to be cleared out first.

So, within read_to_string2, it needs:

s.clear();
s.reserve_exact(len);
try!(reader.take(len as u64).read_to_string(s));

(reader and s are already mutable references).


#6

I/O uses u64 for sizes, so that it’s not limited to the platform’s usize. So that you can handle large files on 32-bit platforms, for example. (Of course not by slurping them into a string in one go, but you can read and seek parts.)

Don’t forget to reserve space in the String, so that it doesn’t need to reallocate during growth.


#7

Thanks, updated the code above to reserve space.


#8

The first solution may be better if you must not reallocate or
overallocate? read_to_string will only use the string’s own allocation,
but it could possibly grow it more than what’s needed.

Edit: Ok, I looked up the implementation again — it will not reallocate / grow the string until the capacity left is 0, so it’s no worry.