How to change character of a mutable str slice?

Stupid simple, but still challenging question.

Playground

1 Like

Changing character in string slice is impossible in general, since it may force the string to change its storage size (e.g. if you replace ASCII character with non-ASCII one).

6 Likes

You are right! But still not obvious is it possible or not? :slight_smile:

I don't think this possible with only safe code and the stdlib methods. However it is technically sound if you check that the range corresponds to a valid substring and the replacement string's length is the same as the range length. For example the copy_from_str crate appears to do exactly this.

Alternatively if you're ok with requiring a &mut String instead of a &mut str, you can use its replace_range method.

2 Likes

What's not obvious? The answer explicitly said it's not possible, right here:

It doesn't get any more obvious than that.

In case you want to modify ascii characters in-place, the ascii crate could be useful. Just get the relevant part of your slice checked to be ascii-only, and then you can start indexing and mutating individual characters.

Example code -> (link)

1 Like

The as_bytes_mut method can be used to modify the string data behind a &mut str as long as the number of bytes remains unchanged. For example, you can do this:

/// Changes `a` to have the same contents as `b`.
/// Both slices must have the same length in bytes.
fn overwrite(a: &mut str, b: &str) {
    // SAFETY: The caller guarantees that `b` contains valid utf-8,
    // so writing it into `a` is ok.
    unsafe {
        a.as_bytes_mut().copy_from_slice(b.as_bytes());
    }
}

playground with example usage

I think it would be reasonable to add a method like the above to the &mut str type in the standard library.

3 Likes

To relate this to previous replies, this looks identical to the function that the copy_from_str crate provides that was mentioned above.

1 Like

"Generally not possible" is not the same as "not possible". That was obvious that changing the length of the slice was not possible, but is it still possible to do a non-changing length mutation without unsafe code was an open question for me. By the way, it is a surprising answer. I'm not sure Rustbook emphasizes it somewhere?

What about slice &mut [u32]? As I understand, it's possible to change the content of an ordinary slice?

The difference is that in ordinary slices, items donā€™t use a variable-length encoding like UTF-8; and also that thereā€™s no invariant to uphold about correct encodings. Hence, ordinary slices like &mut [u32] are easier to mutate; their length cannot change still, but at least ā€œnumber of elementsā€ has a more semantically relevant meaning in such a slice than ā€œnumber of bytesā€ in a str has.

If you look into the code that @alice explains above, youā€™ll see the use of copy_from_slice method, which demonstrates how &mut [T] slices have a more useful API, in fact this API for &mut [u8] ā€“ plus some unsafe code ā€“ is used there. And in my response, youā€™ll see the use of an Index<usize> impl (on a special type, but [T] has that impl, too) to mutate an individual element of a slice.


If this didnā€™t answer your question, be more specific about what youā€™re asking.

What about slice &mut [u32]? As I understand, it's possible to change the content of an ordinary slice?

is remarkably unspecific.

1 Like

Yes, it answers.

That could be used for ordinary slice:

I see.

Rust strings use Unicode, where the concept of "character" is pretty complicated beyond the basic Latin ASCII subset.

If you want to operate on "characters", consider using grapheme clusters:

ā€¦and forget about using &mut str. It is inadequate type for any non-ASCII modifications.

3 Likes

Why is it susprising? Suppose you want to convert ā€œAš“ā€ (without quotes) into ā€œŠAŠā€ (without quotes).

That is: convert [0x41] [0xF0 0x9D 0x93 0x90] to [0xD0 0x90] [0x41] [0xD0 0x90].

That's valid conversion, but you couldn't replace one character with another character to achieve that! It just doesn't work! You have to replace two original characters with three final characters as one-step.

That means that you may either replace characters with the same length or groups of characters. Allowing replacement of character of the same length only would beā€¦ strange.

You would be able to replace Ā¹ (0xŠ”2 0xB9) with Ā² (0xC2 0xB2) or ā° (0xE2 0x81 0xB0) with āµ (0xE2 0x81 0xB5), but replacing ā° (0xE2 0x81 0xB0) with Ā¹ (0xŠ”2 0xB9)? Nope, that's not allowed.

Do you really think such interface would be useful or well-received?

It's not even clear how your expected safe API may ever work.

The only two sane ways are to convert &str into US-ASII &[u8], do the work, then put it back. Or convert &str into String, do the work, then put it back.

3 Likes

Thanks. That's clear :slight_smile:

The surprising is that there is no API for the case when the size does not change.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.