String library respecting Unicode

mvolkmann · December 19, 2020, 1:53am

Is there a library in crates.io that performs common operations on strings that deal in Unicode characters instead of bytes? For example, I'm looking for functions like trim, find, split, match (a regex), and so on that don't just look at a string as a sequence of bytes. I know I can call the chars() method on a String and then could write those functions myself. But it seems like someone has probably already done that.

fosskers · December 19, 2020, 2:16am

There's regex, but its a fairly heavy crate.

mbrubeck · December 19, 2020, 2:32am

The built-in str type and the standard library String type are Unicode strings, stored in UTF-8. All of the standard string methods like find and trim work on arbitrary Unicode text. For example, trim removes all characters with the Unicode White_Space property, not just ASCII whitespace bytes. And find can match arbitrary Unicode characters or substrings.

Another crate you might find useful is unicode-segmentation, if you need to break strings on grapheme or word boundaries as specified by UAX29.

mvolkmann · December 19, 2020, 2:40am

How would you implement this JavaScript code in Rust?

const s = "January|February|March";
const months = s.split('|'); // ["January", "February", "March"]

mmmmib · December 19, 2020, 2:48am

like this: Rust Playground

fn main() {
  let s = "January|February|March";
  let months : Vec<&str> = s.split('|').collect();
  println!("{:?}", months);
}

basically split returns something that's an iterator over references to slices of the original string.

Edit: if you want months to be a Vec<String> instead, use this:

let months : Vec<String> = s.split('|').map(str::to_owned).collect();

system · March 19, 2021, 2:48am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Python-like string in Rust help	22	949	March 22, 2023
How to work with strings and graphemes similar to SQL? How to avoid crate proliferation? help	40	1345	May 10, 2021
Confusion about strings help	5	1447	January 12, 2023
Regex::bytes problem help	3	559	January 12, 2023
New Rustacean e014: Stringing things along announcements	1	725	January 12, 2023

String library respecting Unicode

Related Topics