The title does not really capture the essence of my question, unfortunately.
I'll try to give some context and how I arrived at the current problem.
I have a pretty simple API that takes some raw bytes, parses them, and returns some information. For simplicity, I'll use this as an example:
#[derive(Debug, Copy, Clone)]
struct ParsedData {
value: u32,
size: usize,
}
#[derive(Debug, Copy, Clone, PartialEq)]
enum ParseError {
InvalidSize,
InvalidByte,
}
impl ParsedData {
// If data.len() < 2, return Err(InvalidSize)
// If data[0] is 0, return Err(InvalidByte)
// If data[0] is even, return Ok(value: data[0] + data[1], size: 2)
// If data[0] is odd, return Ok(value: data[0], size: 1)
fn parse(data: &[u8]) -> Result<Self, ParseError> {
let first = *data.get(0).ok_or(ParseError::InvalidSize)? as u32;
let second = *data.get(1).ok_or(ParseError::InvalidSize)? as u32;
if first == 0 {
Err(ParseError::InvalidByte)
} else {
if first % 2 == 0 {
Ok(Self { value: first + second, size: 2, })
} else {
Ok(Self { value: first, size: 1, })
}
}
}
}
Since a single array of bytes could contain multiple valid data points of different sizes, with invalid bytes mixed and match between them, the following code pattern is repeated by multiple callers:
let data = vec![0, 1, 2, 3, 4];
let mut offset = 0;
while offset < data.len() {
match ParsedData::parse(&data[offset..]) {
Ok(parsed_data) => {
println!("offset: {:#x} data: {:?}", offset, parsed_data);
offset += parsed_data.size;
},
Err(e) => {
println!("offset: {:#x} error: {:?}", offset, e);
offset += 1;
}
}
}
This is quite error prone and I'd like to avoid that. My first idea was to make a Parser
structure:
struct Parser<'a> {
data: &'a [u8],
offset: usize,
}
impl<'a> Parser<'a> {
fn new(data: &'a [u8]) -> Self {
Self { data, offset: 0}
}
fn parse_next(&mut self) -> Result<ParsedData, ParseError> {
let result = ParsedData::parse(&self.data[self.offset..]);
match result {
Ok(parsed_data) => self.offset += parsed_data.size,
Err(_) => self.offset += 1,
};
result
}
}
This makes things a little easier, but not by much:
let data = vec![0, 1, 2, 3, 4];
let mut parser = Parser::new(&data);
loop {
match parser.parse_next() {
Ok(parsed_data) => println!("data: {:?}", parsed_data),
Err(ParseError::InvalidSize) => break,
Err(ParseError::InvalidByte) => (),
}
}
This is when I realized that what I want is an Iterator
, so my next step was this:
impl<'a> Iterator for Parser<'a> {
type Item = Result<ParsedData, ParseError>;
fn next(&mut self) -> Option<Self::Item> {
let result = self.parse_next();
if result.is_err() && result.err().unwrap() == ParseError::InvalidSize {
None
} else {
Some(result)
}
}
}
This means that users can now greatly simplify their code:
let data = vec![0, 1, 2, 3, 4];
let parser = Parser::new(&data);
for parsed_data in parser {
println!("{:?}", parsed_data);
}
However, there is one problem here: I no longer know what offset a parsed data structure comes from, and sometimes that information is important. It seems wrong to stuff that information into ParsedData
because it has no purpose being there - it tells nothing about the parsed data, but about the origin of parsed data, and the origin is not relevant in all cases.
Basically it looks like a need something like std::iter::enumerate()
, that returns an offset
instead of an iteration count. I can sort of do that by wrapping ParsedData
into another structure that also holds the offset:
struct ParsedDataIter {
parsed_data: ParsedData,
offset: usize,
}
Is there a better solution than this? Also, how do I decide if I should implement Iterator
or IntoIterator
? Is it a bad idea to have an iterator that jumps through the collection at a random pace as the one I have above does? It feels like it can break some assumptions.