Can slice but can't index an str

Hi,
I'm was writing some code in the playground to expericence with strs when I found something strange : str can't be indexed but can be sliced (from a range). Here's an example:

let a = "Some cool stuff";
let b = "Another awesome str";

let index: usize = 4;

// let a = &a[index]; Error !
let b = &b[index..index+3];

println!("a = {}, b = {}", a, b);

(Rust Playground)

I read some issues about why a str can't be indexed but I don't understand why it can be sliced. Also what's the most effective way to index a string ? The only I found is

let a = a.chars().skip(index-1).next().unwrap();

and it's a little long for a simple operation.

Thanks for reading and sorry if it has already been posted

What operation do you want to do? Do you want the nth character? Do you want the character starting after n bytes? Something else?

The nth character, like indexing a string in python

Then you need to use the iterator chain. The operation is that long to highlight that you are doing something expensive here.

What Python 3 really does is a bit different, but that's because it automatically converts any string containing non-ascii data to an Vec<char>. This makes indexing a much cheaper operation, but makes it take up four bytes per character.

If you want cheap by-character indexing into a non-ascii string, you can do the same and convert it into an Vec<char>. It can be indexed like any other vector.

Ok, thanks.
I just thought, what about

let a = &a[index..index+1];

Is there a difference with the iterator chain? If yes at what cost ?

That is much cheaper than the iterator chain. The chain will have to look through the entire string up to index, whereas indexing takes the same amount of time no matter how large the index is.

The problem is that non-ascii characters such as æ take up multiple bytes. When you write &a[x..y] in Rust, this is using byte indexing. This makes the indexing a cheap operation, so for example:

fn main() {
    let a = "aæb";
    println("{}", &a[0..1]);
    println("{}", &a[1..3]);
    println("{}", &a[3..4]);
}
a
æ
b

If you tried to do &a[1..2], then this will panic because 2 is not at a character boundary. If you don't know how long the character is, you could do this:

fn main() {
    let a = "aæb";
    println!("{}", a[0..].chars().next().unwrap());
    println!("{}", a[1..].chars().next().unwrap());
    println!("{}", a[3..].chars().next().unwrap());
}

If this is too verbose, you can define a helper function:

trait StrExt {
    fn char_at(&self, i: usize) -> char;
}
impl StrExt for str {
    fn char_at(&self, i: usize) -> char {
        self[i..].chars().next().unwrap()
    }
}


fn main() {
    let a = "aæb";
    println!("{}", a.char_at(0));
    println!("{}", a.char_at(1));
    println!("{}", a.char_at(3));
}
a
æ
b

Ok thanks a lot !

Note that the .chars() method on string slices may not be what you want.

From the docs:

It's important to remember that char represents a 
Unicode Scalar Value, and may not match your idea 
of what a 'character' is. Iteration over grapheme 
clusters may be what you actually want. This 
functionality is not provided by Rust's standard 
library, check crates.io instead. 

In particular, depending on what exactly it is you want, you may want to check out the unicode-segmentation crate if you want access to graphemes rather than characters.

(on a side note: for something so commonly used, the unicode-segmentation crate has a name that is not nearly as easy to remember as it should be. I need to look it up every time)

1 Like

See also:

Spolsky's latter article contains a specific error, whereby he conflates UTF-16 with the similar but distinct and deprecated UCS-2. However, it's still a good read.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.