Modify String in-place

Dear Rust experts,

I'm about to add a fill_inplace to my textwrap crate. In short, the goal is to turn some spaces into '\n' without reallocating the input String. Like this, where the break_points are already computed:

fn fill_inplace(text: &mut String, break_points: &[usize]) {
    let mut bytes = text.into_bytes();
    for &idx in break_points {
        bytes[idx] = b'\n';
    }
    *text = String::from_utf8(bytes).unwrap();
}

pub fn main() {
    let mut text = String::from("foo bar baz");

    println!("before: {:?}", text);
    fill_inplace(&mut text, &[3, 7]);
    println!("after:  {:?}", text);
}

(Playground)

However, this doesn't work as-is. It fails with this error:

error[E0507]: cannot move out of *text which is behind a mutable reference

The offending line is let mut bytes = text.into_bytes() which consumes text. I can fix this by using a temporary like this:

fn fill_inplace(text: &mut String, break_points: &[usize]) {
    let mut tmp = String::new();
    std::mem::swap(&mut tmp, text);
    let mut bytes = tmp.into_bytes();
    for &idx in break_points {
        bytes[idx] = b'\n';
    }
    *text = String::from_utf8(bytes).unwrap();
}

I believe this still avoids reallocating the original string, but it seems a bit weird to me to swap things back and forth like that.

Does anybody have an idea for how I can write this better?

Thanks for any help!

You can use let mut bytes = std::mem::take(text).into_bytes().

But the swapping is essential (unless you want to venture into unsafe) -- this is what ensures basic exception safety if the code panics in the middle of fill_inplace.

1 Like

You can use as_mut_vec: playground.

It's unsafe because you may end up with non-UTF8 if you overwrite e.g. half a two-byte codepoint. Your swapping version (or @matklad's suggestion) is safe as from_utf8 verifies the UTF8-ness of your updated bytes.

1 Like

Oh, thanks! I didn't know about this function — though it does the same, it looks simpler somehow :slight_smile:

Thanks, that's definitely also a nice option. The version with the UTF-8 check is already plenty fast, so I'll probably just stick with safe code for now.

I benchmarked both version and could not measure any difference in the timings. Wrapping 1,600 and 3,200 character long strings:

String lengths/fill_inplace/1600
            time:   [11.549 us 11.663 us 11.855 us]
            change: [-1.7676% -0.8406% +0.2085%] (p = 0.07 > 0.05)
            No change in performance detected.
String lengths/fill_inplace/3200
            time:   [23.666 us 23.796 us 23.964 us]
            change: [-1.4300% -0.8371% -0.1698%] (p = 0.01 < 0.05)
            Change within noise threshold.

My conclusion so far is that the computation of the break points (which I removed for simplicity in the example code) is dominating the computation time.

Using the ascii crate:

use ascii::{AsMutAsciiStr, AsciiChar};
fn fill_inplace(text: &mut str, break_points: &[usize]) {
    for &idx in break_points {
        text.slice_ascii_mut(idx..=idx).unwrap()[0] = AsciiChar::LineFeed;
    }
}
1 Like