Hello, I would like to find an ergonomic and efficient way to process an iterator of events with look-ahead. My first thought when it comes to solving this type of a problem would be something like iterating through the iterator and yielding the next value when it is ready, but yield (and generators) don't seem to be available in Rust as of yet.
Here is some pseudo-code in JavaScript solving similar problem:
const items = [ 1, 2, 3, 'start', 10, 12, 13, 'end', 12 ];
function* merge(items) {
let sum = 0;
let inside = false;
for (const item of items) {
if (item === 'start') {
sum = 0;
inside = true;
yield 'start';
} else if (item === 'end') {
inside = false;
yield sum;
yield 'end';
} else if (inside) {
sum += item;
} else {
yield item;
}
}
}
This code essentially has infinite look-ahead for the purpose of summing numbers between tag 'start' and 'end'. In Rust these could be an enum, but the idea would be the same.
How could this type of a problem be solved in Rust? Ideally the function would consume an Iterator and return another Iterator while preserving laziness to the extent that the look-ahead permits.
An extremely rote translation of your program would be something like
#[derive(Debug)]
enum Item {
Value(i32),
Start,
End,
}
use Item::*;
fn main() {
let items = vec![ Value(1), Value(2), Value(3), Start, Value(10), Value(12), Value(13), End, Value(12) ];
println!("{:?}", merge(&items));
}
fn merge(items: &[Item]) -> Vec<Item> {
let mut sum = 0;
let mut inside = false;
let mut result = Vec::new();
for item in items {
match item {
Start => {
sum = 0;
inside = true;
result.push(Start);
},
End => {
inside = false;
result.push(Value(sum));
result.push(End);
},
Value(item) => {
if inside {
sum += item;
} else {
result.push(Value(*item));
}
}
}
}
result
}
Rust doesn't allow heterogenous arrays, so we need a single type that contains all the values you may want in your sequence - hence Item, which consists of any i32, as well as the special values Start and End. Rust also doesn't (yet) have general-purpose generators, so this eagerly computes the result instead of doing so as needed.
The core algorithm here may be expressible in terms of scan or other iterator transforms, but it's probably clearer if expressed in this way. If lazy evaluation is critical to you, then you may need to write your own iterator that uses an internal state machine to decide what to return next.
Minor nitpick, but I don't think this is "lookahead" in the strict sense. Lookahead implies that you want to rollback to reprocess items after peeking ahead in the stream. (See for instance RegEx lookahead.)
Based on the sample code, I came up with an alternative implementation that does internal iteration when the Start/End sentinels are encountered: Rust Playground (Not an exact transliteration. It does not yield the sentinels themselves, just integers.)
Minor nitpick, but I don't think this is "lookahead" in the strict sense. Lookahead implies that you want to rollback to reprocess items after peeking ahead in the stream. (See for instance RegEx lookahead.)
That's a fair point, because the example always does a single thing. I didn't think backtracking is important in an illustrative example, but the general idea I'm thinking about is accumulating intermediary values and deciding what to do with them inside the iterator without having to consume the entire iterator all at once.
Thank you for the example!
EDIT: For slightly more context, I'm trying to work with the output of a pull parser, and I would like to process the stream of events further.