Rust substring function?

  1. I am aware of String in std::string - Rust

  2. I can not find a function 'substring' where it takes two args, start (inclusive) nd end (exclusive), and returns a &str pointing to the cahrs in between.

  3. Does this string not exist due to the "char can be 1, 2, or 4 bytes" ?

  4. EDIT: indices are specified in chars, not utf8 bytes

Ahem...

let b: String = "abc".to_string();
let substring = &b[start..end];

str (and &String) implement Index<Range*>

4 Likes

LOL, all this time I viewed &str as a special construct rather than a slice of chars. Thanks!

2 Likes

all this time I viewed &str as a special construct rather than a slice of chars.

Close, but you have to remember that (a) it is a primitive, and not a slice, although it can be casted to a slice for free, and (b) it's a slice of bytes, not a slice of chars.

3 Likes

It's not a slice of chars. It's a slice of u8s which must be valid UTF-8.

If you're looking for char-based substring, that doesn't exist. You'd need to turn the char indices into byte indices, then slice on that.

2 Likes

And so given what @quadrupleslap and @DanielKeep said, the method I specified only works for char boundaries, most likely used on ones that are one byte long like

abcdefghijklmnopqrstuvwxyz(){}<>[]?:" etc

but not things like :smile: or :cloud: as they don't have a one byte size
Example and source of panic

1 Like
  • If you want to get the "first n characters" or something then your best bet is to use str::chars.
  • Strings are guaranteed to be UTF-8 encoded.
  • String slicing is equivalent to slicing the underlying array (i.e. the indices are for bytes not for chars).
  • Rust will panic if you try to slice the string such that you end up with invalid UTF-8.
  • chars don't necessarily correspond to a user's idea of what a single "character" is, either.
4 Likes

@OptimisticPeach , @quadrupleslap , @DanielKeep : Thanks everyone!

I just noticed: .len(), .find , and .rfind returns byte offsets. Thus, although &s[a..b] is indexing bytes, it's okay -- as I do NOT have char-indexes, I have byte indexes. :slight_smile:

2 Likes