How to get a substring of a String

Why so? Especially when the chars() documentation still urges the user to use graphemes instead of chars:

It's important to remember that char represents a Unicode Scalar Value, and may not match your idea of what a 'character' is. Iteration over grapheme clusters may be what you actually want.

Because it's not something the standard library has to include, not everyone is writing text manipulation code (beyond simple interpolation), it's one less thing the core devs have to maintain forever, the tables required can be quite large, and it allows the version of Unicode supported to be updated independently of the compiler.

Rust is not trying, and has never tried, to be "batteries included".

3 Likes

Oh I know and support std library's approach to being minimal, but chars() being there and graphemes() not is kinda broken. In my view one is much more likely to need grapheme clusters than Unicode scalar values (as the discussion above indicates). So if we had to pick one method it should have been graphemes() rather than chars(). Why was chars() preferred then?

The standard library has to be able to convert UTF8 into UTF16 in order to provide file system access under Windows, so it can't just treat str as a bag of bytes. There exists one, and only one, right way to convert between Unicode encodings, and it is unlikely to ever change.

Being able to encode/decode Unicode is absolutely required for Rust to be able to talk to the OS (at least on Windows). Splitting on grapheme cluster boundaries isn't required for anything else Rust provides.

2 Likes

Hi,
I propose this

    let s = "Gölden Eagle";
    let (idx, _) = s.char_indices().nth(6).unwrap();
    dbg!(&s[..idx]); // &s[..idx] = "Gölden"
2 Likes

You can write it like that without 'mut':

    let s = "Gölden Eagle";
    let end = s.chars().map(|c| c.len_utf8()).take(6).sum();
    println!("{}", &s[..end]);

Please don't revive old threads.

2 Likes