Why do functions like read_line need to use a mutable output variable?


#1

Why do functions like read_line() need to use a mutable output variable? Why can’t it just return an immutable string? In other words why isn’t the signature something like…

fn read_line(&self) -> Result<String>

Can’t the user query the returned string for its length?

IMO this can be used to improve the ergonomics somewhat…

Here is the guessing game example from The Book.

fn main() {
    println!("Guess the number!");
    println!("Please input your guess.");

    //Pseudo-rust. I am very new to this language.
    let guess = io::stdin().read_line();
    
    match guess {
        Ok(str) => println!("You guessed: {}", str),
        Err(e) => fail(e)
    }
}

#2

It allows one to reuse a String allocation. If the return type was Result<String>, then read_line would be forced to allocate a fresh String with the contents of the line on every call. If it is given an existing String, then it will actually allocate very rarely (assuming a reasonable distribution of line lengths).


#3

I understand but what if the user wants to preserve the contents of the previously read strings? Returning a string from read_line() makes sense in this case.

Anyway I think this is just an optimization problem. Given a “sufficiently smart compiler” you can still get away with a single allocation when the user does not need to preserve the previous contents.

Look at C++'s Return Value Optimisation for an example of how the string can be copied into guess at zero cost and look at persistent data structures (PDS) for maintaining multiple immutable string allocations efficiently. In an ideal world using PDS you will only have as may strings allocated as you really need.


#4

Then the caller must clone it. The point is that the caller gets to decide the allocation strategy, not the callee.

Rust has RVO. I don’t think that is relevant here.


#5

IMO deciding allocation strategies is a burden on the user (caller). The tool (Rust/stdlib) should use the best allocation strategy based on the code the user has written. Isn’t this the point of going high level?

Good to know! I think it is kind of relevant if you want to reuse the same allocation. The compiler should use RVO when it can guarantee that destroying the previous string is safe else it should use a PDS based strategy.


#6

Rust is about allowing to decide allocation strategies. Also “the best” is not always the same in all contexts.


#7

One of the core tenets of Rust’s design is making costs explicit, and controllable. The stdlib still needs work on the latter in some areas but overall it’s held quite well.

Something @BurntSushi may have forgotten to mention is that read_line() actually used to return io::Result<String>. The current version of the API was decided to be superior for a couple of reasons. I can try to find the RFC for it if my explanation doesn’t convince you.

As previously stated, it allows for the reuse of allocations. While a good system allocator should cache previously freed allocations where possible, Rust allows the expression of this caching in userspace, which helps on platforms that might not have sophisticated allocators.

However, something no one else has said yet, is that taking a reference to an output buffer facilitates much more sane handling of incomplete reads. In the previous design, if an I/O error occurred during reading, all data read up to that point was discarded because io::Result cannot express an intermediate result. It was suggested that a variant be added that would allow explicit expression of an incomplete result, but it would have been much more of a burden on the user to require them to explicitly handle or ignore an edge case like that.

Also, I don’t think RVO has much of anything to do with reusing heap allocations. IIRC, it changes the return of large structs under the hood so they are written to an out-pointer instead of being copied back to the caller’s stack frame. This is a great optimization nonetheless, but it doesn’t have much bearing on the issue at hand.


#8

Can’t you return the incomplete string as part of Err()?

Caching is a runtime optimization. I was talking about making compile time optimizations.

I agree with you (and BurntSushi). I misspoke. I just wanted to use RVO as an example where the called function directly modifies a variable in the callers stack without passing an explicit pointer to the callee. I was hoping the rust compiler could do this when it ascertained doing so was safe. The compiler needs to change its type inference logic to infer a mutable-immutable object. It can infer a mutable object otherwise.

For example in…

fn test() {
    let guess = io::stdin().read_line();
}

The type of guess can be immutable String. While in …

fn test() {
    loop {
        let guess = io::stdin().read_line();
    }
}

the type of guess can be mut String.

I hope this makes some sense.


#9

That would seem like more of a burden to me, to have to inspect the error for the incomplete result. That’s like, the opposite of what you want, isn’t it? Passing an output buffer facilitates the transparent handling of incomplete reads.

These changes were first introduced during The Great std::io/os Redesign in late 2014/early this year. A lot of work and debate went into making them a reality.


#10

Thanks I will go through the RFC.

I don’t mind passing an output buffer if it were immutable! I know this does not make much sense but I guess what I want is to give functions the ability to initialize immutable variables through function parameters.


#11

You can always just make it immutable afterward:

let mut s = String::new();
io::stdin().read_line(&mut s);
let s = s; // now immutable

#12

There was some talk about having an &init reference type that could reference uninitialized memory and basically work as an explicit write-only out-pointer. I don’t know what became of the discussion, and I doubt it would work for this use-case anyways.

Like @tikue said, you can always reassign the mutable buffer to an immutable binding after the fact, or create a simple wrapper function to do it for you:

/// If you care about intermediate results:
fn read_line<R: BufRead>(read: &mut R) -> (String, io::Result<usize>) {
    let mut buf = String::new();
    let res = read.read_line(&mut buf);
    (buf, res)
}

/// If you don't care and just want the old API:
fn read_line<R: BufRead>(read: &mut R) ->. io::Result<String> {
    let mut buf = String::new();
    // One of the few times I've had a use for Result::and()
    read.read_line(&mut buf).and(Ok(buf))
}

#13

Thanks guys I like this idea. &init would have been wonderful though. Let me see if I can find the old discussion.


#14

The trouble with bare &init references is that if you leave the scope by a panic, you need to make sure that nobody will see the uninitialized data. AFAICT the currently favored approach is to use smart pointers that destroy the underlying variable on panic, namely std::ops::InPlace; there is a thread on internals with some more discussion.


#15

I’d also like to note that when iterating linewise, you can use BufRead::lines(), which gives yields Result<String,Err> and is most of the time the thing you want to do in that case anyways.