Is there a library in crates.io that performs common operations on strings that deal in Unicode characters instead of bytes? For example, I'm looking for functions like trim, find, split, match (a regex), and so on that don't just look at a string as a sequence of bytes. I know I can call the chars()
method on a String
and then could write those functions myself. But it seems like someone has probably already done that.
The built-in str
type and the standard library String
type are Unicode strings, stored in UTF-8. All of the standard string methods like find
and trim
work on arbitrary Unicode text. For example, trim
removes all characters with the Unicode White_Space
property, not just ASCII whitespace bytes. And find
can match arbitrary Unicode characters or substrings.
Another crate you might find useful is unicode-segmentation
, if you need to break strings on grapheme or word boundaries as specified by UAX29.
How would you implement this JavaScript code in Rust?
const s = "January|February|March";
const months = s.split('|'); // ["January", "February", "March"]
like this: Rust Playground
fn main() {
let s = "January|February|March";
let months : Vec<&str> = s.split('|').collect();
println!("{:?}", months);
}
basically split returns something that's an iterator over references to slices of the original string.
Edit: if you want months
to be a Vec<String>
instead, use this:
let months : Vec<String> = s.split('|').map(str::to_owned).collect();
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.