Hello there- Rust noob question.
I’m developing a Tamil language (தமிழ்) tokenizer using Rust.
This is what I have working now but seems excessive to me. How could I improve it ?
/** Split a tamil-unicode stream into
* tamil characters (individuals).
*/
pub fn get_letters(x:&str) -> Vec<String> {
/* Splits the @word into a character-list of tamil/english
*characters present in the stream. This routine provides a robust tokenizer
*for Tamil unicode letters. */
let mut v: Vec<String> = Vec::new();
let mut tmp:String=String::from("");
for (idx,c) in x.chars().enumerate() {
if x.is_char_boundary(idx) {
if ( tmp.len() != 0 ) {
v.push(format!("{}",tmp));
v.push(format!("{}",c));
} else {
v.push(format!("{}",c));
}
tmp.clear();
} else {
tmp = format!("{}{}",tmp,c);
}
}
if tmp.len() != 0 {
v.push(tmp);
}
v
}