Is there a library in crates.io that performs common operations on strings that deal in Unicode characters instead of bytes? For example, I'm looking for functions like trim, find, split, match (a regex), and so on that don't just look at a string as a sequence of bytes. I know I can call the chars()
method on a String
and then could write those functions myself. But it seems like someone has probably already done that.
The built-in str
type and the standard library String
type are Unicode strings, stored in UTF-8. All of the standard string methods like find
and trim
work on arbitrary Unicode text. For example, trim
removes all characters with the Unicode White_Space
property, not just ASCII whitespace bytes. And find
can match arbitrary Unicode characters or substrings.
Another crate you might find useful is unicode-segmentation
, if you need to break strings on grapheme or word boundaries as specified by UAX29.
4 Likes
How would you implement this JavaScript code in Rust?
const s = "January|February|March";
const months = s.split('|'); // ["January", "February", "March"]
fn main() {
let s = "January|February|March";
let months : Vec<&str> = s.split('|').collect();
println!("{:?}", months);
}
basically split returns something that's an iterator over references to slices of the original string.
Edit: if you want months
to be a Vec<String>
instead, use this:
let months : Vec<String> = s.split('|').map(str::to_owned).collect();
2 Likes