Let’s see about changes without changing the logic…
There’s two things that could be changed in terms of ownership in this code, on one hand the String
s being returned could be &str
slices into the original string, as @SkiFire13 pointed out. The other thing, where iterators can play a role, is that the return type could also be a impl Iterator<Item = …>
type instead of a Vec
, allowing the user to choose whether to .collect()
into a Vec
or to just iterator over the result once and consume it directly.
A third thing that could be done is to use functionality such as .find(…)
from str
instead of looping over the values. The previous answer demonstrated .split(…)
, which is similar higher-level functionality. Also, this kind of code is probably straightforward to implement using Regex
es.
Let’s see how small changes could be made to the original code in order to do things described above.
E.g. in order to return &str
slices into the original string, you’d need to record the range of indices instead of collecting a buffer
, and then return a &msg[start…end]
view:
fn parser(msg: &str) -> Vec<&str> {
let mut v: Vec<&str> = vec![];
let mut start = None;
for (i, c) in msg.char_indices() {
if c == '[' {
if start.is_none() {
start = Some(i + 1);
}
continue;
}
if c == ']' {
v.push(start.map(|s| &msg[s..i]).unwrap_or_default());
start = None;
}
}
v
}
Here char_indices
is used instead of char
, so we know where the characters come from and can thus record our boundaries and create the string accordingly. The variables buffer
and brackets
are replaced by a start: Option<usize>
which is None
when brackets
was false
, and Some(i)
where i
is the index where the chars in buffer
started, when brackets
is true
.
This code preserves the behavior if ]
s is encountered when no [
was seen. It changes the behavior for multiple [
s though, as previously, the parser
would have returned a String
consisting of anything but the [
symbols themself, as in something like parser("this [message[has[many[brackets]")
returning ["messagehasmanybrackets"]
which is not a slice of the original string, so the above will return ["message[has[many[brackets"]
instead.
Of course, even more ways of handling mismatching or nested brackets could be discussed, be it by considering proper nesting, or reporting errors.
If you want to return an impl Iterator
, then std::iter::from_fn
is a useful tool to re-structure the code. Let me first suggest a minor change to the original code, where std::mem::take(&mut buffer)
seems nicer to me than the operation of cloning and clearing the buffer for the case
if c == ']' {
brackets = false;
v.push(buffer.clone());
buffer.clear();
}
becoming
if c == ']' {
brackets = false;
v.push(std::mem::take(&mut buffer));
}
Now here is how to return impl Iterator
instead (with minimal change):
fn parser(msg: &str) -> impl Iterator<Item = String> + '_ {
let mut buffer = String::new();
let mut brackets = false;
let mut chars = msg.chars();
std::iter::from_fn(move || {
for c in chars.by_ref() {
if c == '[' {
brackets = true;
continue;
}
if c == ']' {
brackets = false;
return Some(std::mem::take(&mut buffer));
}
if brackets {
buffer.push(c);
}
}
None
})
}
The use of for c in &mut chars
iterates the iterator char = msg.chars()
by mutable-reference access, which has the effect that the iteration can be aborted and later resumed as the for
loop didn’t consume the iterator yet. It’s aborted for every completed parsed string, via an early return of Some(mem::take(&mut buffer))
; one it does complete, out iterator returns None
indicating the end of the original Vec
.
Of course, both approaches could be combined:
fn parser(msg: &str) -> impl Iterator<Item = &str> {
let mut start = None;
let mut char_indices = msg.char_indices();
std::iter::from_fn(move || {
for (i, c) in char_indices.by_ref() {
if c == '[' {
if start.is_none() {
start = Some(i + 1);
}
continue;
}
if c == ']' {
let start = start.take(); // sets to `None`
return Some(start.map(|s| &msg[s..i]).unwrap_or_default());
}
}
None
})
}
By the way, if you want to display a thing Vec
-style without converting into a Vec
, the .format(…)
method from the itertools
crate is quite useful, e.g.
use itertools::Itertools;
fn main() {
let space_msg = "IF[14E]LG[5O]D[20O]HELLO[1Q]";
println!("{space_msg}");
let parsed_space_msg = parser(space_msg);
println!("[{:?}]", parsed_space_msg.format(", "));
}
Finally, as another demonstration of find
, though of course it significantly changes the way mismatching or nested brackets are treated, here’s a from_fn
-version of the code using find
that @DanielKeep provided above:
fn parser(mut msg: &str) -> impl Iterator<Item = &str> {
std::iter::from_fn(move || {
let start = msg.find('[')?;
let tail = &msg[start + 1..];
let end = tail.find(']').expect("oh no, unbalanced braces!");
msg = &tail[end + 1..];
Some(&tail[..end])
})
}