How to get a substring of a String


#1

Hi,
what is the best way to get a substring of a String?
I couldn’t find a substr method or similar.

Let’s assume I have a String like “Golden Eagle” and I want to get the first 6 characters, that is “Golden”.

How can I do that?

Markus


#2

Strings can be sliced using the index operator:

let slice = &"Golden Eagle"[..6];
println!("{}", slice);

The syntax is generally v[M..N], where M < N. This will return a slice from M up to, but not including, N. There are also some more sugary syntax, like [..N] (everything up to N), [N..] (everything from N and forwards) and [..] (everything).

It’s the same for String, as well as vector/array types.


#3

It’s important to note that this is a slice of bytes, it will not actually return the first six characters.

let slice = &"Können"[..6];
println!("{}", slice);

prints Könne.


#4

Good point. I thought that something felt fishy when I remembered that you cannot index a string and get a character.


#5

Yes, and that’s exactly why :smile:

The issue with your question, @mjais, is that ‘character’ isn’t a well-defined thing in the unicode universe. Check out http://doc.rust-lang.org/nightly/book/strings.html#indexing


#6

something like http://is.gd/PIX31l should do the trick for indexing unicode codepoints but this still would not handle strings with combining characters. I wonder if there is a more obvious or easier way to do this.


#7

You want graphemes, but I believe that was de-stabilised because it might/might not be moving to an external crate.

Frankly, “how do I get the first X characters” is almost never a valid question in the first place: there’s pretty much no reason to ever do it.


#8

Thanks for all the answers. Very helpful!!!.

I will look at the links.
I understand that the problem is harder than it seems, particularly if one wants Unicode support and efficiency at the same time :smile:


#9

be careful - this code can cause panicking.


#10

It’s only useful in this special case. See the rest of the comments.