[Solved] Multibyte strings in Rust

I have a string, where some portion of the string is enclosed in double quotes. So, it looks like this:
“start of the string, ““middle of the string””, end of the string”. I am looking for the indexes of double quotes to extract “middle of the string”. Some strings are a mix of English and international multi-byte characters such as Chinese, and that is where it breaks. As an example, I translated to Chinese a piece of text from the Rust online book.

fn main() {
    //let line = "We talked about strings in Chapter 4, \"\"but we’ll look at them in more depth now. New Rustaceans commonly get stuck on strings for a combination of three reasons: Rust’s propensity for exposing possible errors, strings being a more complicated data structure than many programmers give them credit for, and UTF-8. \"\"These factors combine in a way that can seem difficult when you’re coming from other programming languages.";
    let line = "我們在第4章討論了字符串,\"\"但現在我們將更深入地研究它們。 新的Rustaceans通常會因為三個原因而陷入字符串:Rust暴露可能的錯誤的傾向,字符串是比許多程序員更加複雜的數據結構,以及UTF-8。\"\" 當你來自其他編程語言時,這些因素結合起來似乎很難。";
    match  line.find("\"\"") {
        Some(i1) => {
            let (s1, s2) = line.split_at(i1 + 2);
            let i2 = s2.find("\"\"").unwrap();
            println!("{} : {}", i1 + 2, i2 + i1);
            let body = &line[i1 + 2..i2 + i1];
            println!("{}", body);
        None => ()

The above code results in the runtime error

thread 'main' panicked at 'byte index 258 is not a char boundary; it is inside '。' (bytes 257..260) of `我們在第4章討論了字符串,""但現在我們將更深入地研究它們。 新的Rustaceans通常會因為三個原因而陷入字符串:Rust暴露可能的錯誤的傾向,字符串是比許多程序員更加複雜的數據結構,以及UTF-`[...]', src/libcore/str/mod.rs:2027:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

The English only version of the text works. I tried both, stable and nightly version of Rust.

That looks wrong to me, doesn’t the upper bound need to be i2 + i1 + 2 to acommodate for the two quotes that you’ve chopped of, just as with the lower bound?

Yep. That was the problem. Thanks!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.