Is there anyway to get a &mut str Split from str::split?

starwing · August 30, 2020, 1:25am

Hi all,

I'm try the leetcode #5571: reverse the words in string. And the basic idea is s.split(' ').map(...).collect().join(" "), that works good.

But I'm wondering why &mut str lacks a split_mut() methods just return a Iterator<Item=&mut str>?

I noticed there is str::split_at_mut(), as well as slice::split_mut(). So why str::split_mut() lacks?

After all, is there any way to give me a mutable string slice iterator when split?

scottmcm · August 30, 2020, 1:43am

&mut str is pretty rare because it's a very hard type to do anything useful with -- UTF-8 (and unicode segmentations in general) being variable-width means it's usually not possible to replace one substring with another, for example.

Often for leetcode-style problems -- that tend to expect you to do things "wrong" from a full-unicode perspective -- you want to just use the bytes and assume they're ASCII.

P.S. I think you're missing a .rev() in the basic idea.

starwing · August 30, 2020, 1:59am

Thanks for reply!

the '.rev()' call is in map(...) I have not write it completely.

Thanks for the hint! I have finished the quest by cast String to [u8]:

impl Solution {
    pub fn reverse_words(mut s: String) -> String {
        unsafe {s.as_bytes_mut()}
        .split_mut(|&ch| ch == 32)
        .for_each(|s| s.reverse());
        s
    }
}

L.F · August 30, 2020, 2:14am

Instead of using unsafe { s.as_bytes_mut() }, I would recommend using into_bytes to convert the String into a Vec<u8> altogether, so you don't have to worry about invoking undefined behavior by putting invalid characters in there. In the end, from_utf8 can be used to convert the Vec<u8> back to a String, which also gives you a chance to handle invalid UTF8 if necessary.

Hyeonu · August 30, 2020, 2:53am

No no no, it's UB. Please don't do that.

Let's starts with WHY it's bad. Try run your code with some real world input.

And it crashed with some message below. What happended on it?

Execution operation failed: Output was not valid UTF-8: invalid utf-8 sequence of 1 bytes from index 0

String slices are always valid UTF-8. And the UTF-8 is a variable-width character encoding, means each code point is represented in one or more bytes in the encoded text. For example, the string "안" is represented in three bytes [236, 149, 136]. And reversing its bytes produces invalid UTF-8 sequence.

Remember the String slices are always valid UTF-8 guarantee? All guarantees are proved and enforced by the compiler in safe Rust. But in unsafe{} block, it's you who have responsibility to satisfy every guarantees defined by the language and the libraries. Otherwise it's UB, means you may observe crashes at best, or your entire memory address spaces got silently corrupted so totally unrelated part of your code will behave incorrectly.

As a conclusion, try your absolute best to avoid writing any unsafe{} block by hand. It's main purpose is to write safe abstraction of building blocks, on some heavily audited codebase like stdlib, so everyone can play safely on those types like Vec<T> and HashMap<K, V>. Sometimes you may need to write some of it, like interacting with C FFI. In this case, try your best to write your logic in totally safe Rust and minimize the impact of unsafe-ness.

Bonus, this is a totally safe and correct version of your function.

Note that this code only reverses code points between whitspces, so multi-codepoint-characters like this emoji 👨‍👩‍👧‍👧 produces some weird result. But it's the problem of the leetcode question itself. Blame leetcode to serve pre-unicode-era questions!

impl Solution {
    pub fn reverse_words(mut s: String) -> String {
        s.split_whitespace()
            .map(|substr| substr.chars().rev())
            .flatten()
            .collect()
    }
}

scottmcm · August 30, 2020, 7:53am

Ah, I thought this was "reverse the order of the words", which is a better question as it has more interesting implementations -- the canonical solution being to reverse the words then reverse the whole string, which as a bonus keeps the code units inside the words in the correct order, avoiding the problems that Hyeonu mentioned.

system · November 28, 2020, 7:53am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Can we make a &mut str?	5	1690	August 1, 2023
Help converting a `&mut str` into a `&mut [u8]` help	7	2954	August 1, 2021
Split a string and store it in a hashmap help	14	3135	April 20, 2020
Is compile-time split of &str possible? help	12	595	June 21, 2023
Converting String to &str in map and returning help	6	964	May 25, 2023

Is there anyway to get a &mut str Split from str::split?

Related Topics