I have tried split_at(&self, mid: usize) -> (&str, &str), but it splits the string into ("12", "345678"), then I get trapped in how to split the second element of the tuple recursively. Could you give me some hint?
Maybe there's a way to use slice::chunks here, but otherwise you can do something similar manually, e.g.:
let mut v = vec![];
let mut cur = string;
while !cur.is_empty() {
let (chunk, rest) = cur.split_at(cmp::min(sub_len, cur.len()));
v.push(chunk);
cur = rest;
}
Yeah, I thought about that too but didn't bother to not clutter the example too much.[quote="juggle-tux, post:3, topic:10542"]
also i made a version with uses chunks
[/quote]
Which is why chunks itself might not be a good idea for general string.
Either OP knows somehow which kind of character he is working and he can use chunks/from_utf8_unckeched or he doesn't and must use string.chars() iterator (probably with a loop and using Iterator::by_ref)
Here's one more version using the itertools crate (playpen):
Note: this method needs to allocate new Strings for each sub_string, whereas the other examples here return slices of the source.
I believe you are not using itertools here.
You also do not need into_iter.
To recap, with a more functional safe version (not necessarily better)
fn main() {
let string = "12345678";
let sub_len = 2;
// Case 1: you don't know the data you're playing with
//
// Characters may be single or multiple byte encoded (per definition of utf8)
// Thus you cannot just chunk the data and MUST rely on `chars()` iterator
//
// It also means you cannot return fixed size slices. You need to own each strings
let mut chars = string.chars();
let sub_string = (0..)
.map(|_| chars.by_ref().take(sub_len).collect::<String>())
.take_while(|s| !s.is_empty())
.collect::<Vec<_>>();
println!("Safe: {:?}", sub_string);
// Case 2: you work with some 'simple' data where you know in advance that
// all characters will be single byte encoded.
//
// In particular, this is true for all US-ASCII characters
// see https://en.wikipedia.org/wiki/UTF-8
//
// Then, and only then, you can be wild and unsafe and crazy fast
let sub_string = string.as_bytes()
.chunks(sub_len)
.map(|s| unsafe { ::std::str::from_utf8_unchecked(s) }) // unsafe ok because we are certain? we have valid str
.collect::<Vec<_>>();
println!("Unsafe: {:?}", sub_string);
}
I am indeed using itertools to use the chunks iterator adaptor (which requires another into_iter to iterate over). You can also use std::slice::chunks which you're using for string.as_bytes().chunks(n)
It won't be as fast as operating on raw bytes or returning &strs, but it's another option that reads easy and works as expected (chunking chars).
Yet another version using std::slice::chunks instead of itertools' chunks adapter (this has to pull the chars into a temp Vec):
let chars: Vec<char> = s.chars().collect();
let split = &chars.chunks(2)
.map(|chunk| chunk.iter().collect::<String>())
.collect::<Vec<_>>();
println!("{:?}", split);
By which I expect you mean the confusion over what's even a "character" in Unicode -- char being a single code point vs. a visual grapheme that may consist of many chars. So even proper char chunks may end up splitting a combining character from the one it's modifying.