I have a string, where some portion of the string is enclosed in double quotes. So, it looks like this:
"start of the string, ""middle of the string"", end of the string". I am looking for the indexes of double quotes to extract "middle of the string". Some strings are a mix of English and international multi-byte characters such as Chinese, and that is where it breaks. As an example, I translated to Chinese a piece of text from the Rust online book.
fn main() {
//let line = "We talked about strings in Chapter 4, \"\"but we’ll look at them in more depth now. New Rustaceans commonly get stuck on strings for a combination of three reasons: Rust’s propensity for exposing possible errors, strings being a more complicated data structure than many programmers give them credit for, and UTF-8. \"\"These factors combine in a way that can seem difficult when you’re coming from other programming languages.";
let line = "我們在第4章討論了字符串,\"\"但現在我們將更深入地研究它們。 新的Rustaceans通常會因為三個原因而陷入字符串:Rust暴露可能的錯誤的傾向,字符串是比許多程序員更加複雜的數據結構,以及UTF-8。\"\" 當你來自其他編程語言時,這些因素結合起來似乎很難。";
match line.find("\"\"") {
Some(i1) => {
let (s1, s2) = line.split_at(i1 + 2);
let i2 = s2.find("\"\"").unwrap();
println!("{} : {}", i1 + 2, i2 + i1);
let body = &line[i1 + 2..i2 + i1];
println!("{}", body);
},
None => ()
}
}
The above code results in the runtime error
thread 'main' panicked at 'byte index 258 is not a char boundary; it is inside '。' (bytes 257..260) of `我們在第4章討論了字符串,""但現在我們將更深入地研究它們。 新的Rustaceans通常會因為三個原因而陷入字符串:Rust暴露可能的錯誤的傾向,字符串是比許多程序員更加複雜的數據結構,以及UTF-`[...]', src/libcore/str/mod.rs:2027:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
The English only version of the text works. I tried both, stable and nightly version of Rust.