Is it possible to consume an iterator in O(1) instead of O(n)?

I’m writing a lexer. My lexer implements Iterator<Item=Result<Token, InvalidInput>> When I find an invalid input that I can’t parse, I would like to flush my internal iterator (it’s a std::iter::CharIndices over the input string), so that the next time the next() function is called, it returns None.

Currently, I’m doing:

while let Some(_) = self.iter.next() {}
return Some(Err(InvalidToken));

This is (I assume) slow, since this is going to be O(n) (unless the compiler optimize it).

Could I do something like:

self.iter.flush();
return Some(Err(InvalidToken));

I took a look at std::iter::Fuse, but I didn’t found such method.

Or do I need to add a boolean in my lexer?

In general, a loop is the only way to iterate over an iterator. Depending on what you're iterating over, you could either take advantage of SIMD or parallelism, but the tokenizer step is not really suited for either. In most cases, your tokenizer will be too fast and it's just not worth trying to split the work for a single input, unless the input is quite large.

Optimizing too early is rarely a good idea. You can keep the idea of optimization at the back of your head, but until you've proven, that your code is actually too slow, I'd recommend not doing optimization. It usually comes at a cost, more complexity, increasing likelihood of bugs and decreasing readability/maintainability.

self.iter = "".char_indices();
5 Likes

Does your lexer need to consume its token iterator at all? Why not just return the error and either destroy the lexer or never touch it again?

@Phlopsi It’s not an optimization, it’s removing a pessimisation. I wanted to express that the input was consumed, not doing the expensive work of consumed it.

@quinedot Perfect! Exactly what I needed!

@Michael-F-Bryan It’s effectively not totally needed, it’s just that it’s much easier to write tests (and also less error prone since I can’t try to consume the input past an error). In my test, I can collect the output of the lexer and compare it with the array of expected things. If I don’t consume the input, I need to add everything that’s after the error.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.