How to print out part of a string literal

Joe232 · January 20, 2019, 11:40pm

fn main()
{
    let x = "Testing";

    println!("{}", x[1]); // Error here
}

Why can't I print out the index of a String literal? In Python this is possible but why not Rust? How can I print part of a String literal?

OptimisticPeach · January 20, 2019, 11:43pm

Strings can't be indexed because of UTF-8 formatting that makes the individual characters of different lengths in memory, and therefore have to be calculated procedurally.
Instead you can iterate over a string's chars like so:

let x = "Testing";
for i in x.chars() {
    //
}

and to do what you wanted to do originally, you can do this:

let x = "Testing";
for (index, ch) in x.chars().enumerate() {
    if index == 1 {
        println!("{}", ch);
        break;
    }
}

Additionally you can collect them into a Vec<char> like so:

let x = "Testing";
let v = x.chars().collect::<Vec<char>>();

Joe232 · January 20, 2019, 11:49pm

So what characters can take up more memory compared to other characters? So in Python why can the characters be indexed? Is it cause it is not using UTF-8?

Thanks for providing the solution

OptimisticPeach · January 20, 2019, 11:50pm

Well, as explained here a regular string in python is ascii where every character is exactly one byte long, while UTF-8 (Otherwise known as unicode) can contain things like emojis or non-latin characters

Joe232 · January 20, 2019, 11:54pm

Oh I see, thanks for that.

ExpHP · January 20, 2019, 11:58pm

That is only true for Python 2's str type (known as bytes in Python 3). Python 3's str type (known as unicode in Python 2) also allows indexing of individual characters.

Resources on Python claim that python stores "each code point" separately. I am, however, having a difficult time telling whether that means it is encoded in UTF-32 (or similar), or if, more likely, those resources are incorrect and it is encoded in WTF-16.

In either case, I imagine that python simply returns a substring from the ith element to the i+1th element, whether that substring is well-formed or not.

dcarosone · January 21, 2019, 12:09am

This is covered clearly and directly in the book: Storing UTF-8 Encoded Text with Strings - The Rust Programming Language

DanielKeep · January 21, 2019, 12:11am

If I recall correctly, Python changes the layout of strings on the fly depending on how many bytes each code point will fit into. So if a string only contains Latin-1, it uses one byte units. If a string contains Japanese text, it's probably two byte units. If it contains more exotic text, probably four byte units.

But keep in mind that codepoints are not characters. Characters can be comprised of an arbitrary number of codepoints. In addition, whether a sequence of codepoints counts as a single symbol can depend on your operating system and what font is being used.

ExpHP · January 21, 2019, 12:36am

and on the geopolitical state of the world:

Edit: Sadly it seems chrome on windows always treats pairs of these codepoints as one character even if invalid (e.g. 🇪🇧 <-- try selecting one of the two characters in that). Which is less exciting.

RustyYato · January 21, 2019, 1:33am

Instead of this, you can do

let nth_char: Option<char> = x.chars().nth(index);

_{Iterators are amazing!}

DanielKeep · January 21, 2019, 1:53am

Firefox treats it as two. So it depends on your OS, font and browser.

Also, don't forget that things like the non-standard cat ninja emoji on Windows 10 are things, so you can't even rely on the official, canonical definition of what symbols exist.

Because text wasn't hard enough already...

scottmcm · January 21, 2019, 3:26am

You can instead request a one-encoding-unit-long substring:

println!("{}", x[1..2]);

Note, however, that this will helpfully error if you try to index inside a single unicode-scalar-value, so if you want to do text "properly", you want one of the other things people have mentioned in this thread.

Hyeonu · January 21, 2019, 3:31am

As you may understand now, direct string indexing is complex, even more complex than most of people can imagin. This is why Rust decided not to support it.

Topic		Replies	Views
Can slice but can't index an str help	10	2307	July 21, 2021
Slices why can't I use just one number? help	17	681	November 16, 2020
Is there another way of indexing a String rather than converting it to bytes?	30	1747	November 17, 2020
Several questions on the String help	4	1004	January 12, 2023
Do there exist unicoded strings where len()/python and len()/rust are different?	8	3181	January 12, 2023

How to print out part of a string literal

Related topics