How to use iterators

Hi,

I'm trying to port some existing code to parse obj files to Rust (mostly as a learning exercise!) I've got this working, but it currently represents the data being parsed as a &[u8] slice and usize 'pointer' into that slice. However, this doesn't feel very idiomatic, and suspect I would be better using iterators somehow, but I'm really struggling to even get anything to compile, let alone work, so could just some pointers!

To take a concrete example, here's a small function that parses the sign in front of a number - it might be +, -, or not present at all. The function returns the sign (either +1 or -1) and consumes the symbol if it's there:

fn parse_sign(data: &[u8], ptr: usize) -> (isize, usize) {
    match data[ptr] {
        b'+' => (1, ptr + 1),
        b'-' => (-1, ptr + 1),
        _ => (1, ptr),
    }
}

fn test_sign() {
    let (val, ptr) = parse_sign(b"-1234", 0);
    assert!(val == -1);
    assert!(ptr == 1);
}

Works great. Here's my initial stab at converting it to something using an iterator.

fn iterate_sign<I>(iter: &mut std::iter::Peekable<I>) -> isize
where
    I: std::iter::Iterator<Item = u8>,
{
    match iter.peek() {
        Some(b'+') => {
            iter.next();
            1
        }
        Some(b'-') => {
            iter.next();
            -1
        }
        _ => 1,
    }
}
fn test_sign() {
    let data: &[u8] = b"-1234 ";
    let mut iter = data.iter().peekable();
    let val = iterate_sign(&mut iter);
    assert!(val == -1);
    assert!(iter.peek() == Some(b'1'));
}

However, this doesn't even compile, complaining

error[E0271]: type mismatch resolving `<std::slice::Iter<'_, u8> as std::iter::Iterator>::Item == u8`

I can't for the life of me figure out how to even create a function that takes an iterator...not sure if I'm not passing the right thing in, or if I should be defining the iterate_sign function differently. Suspect I'm confusing u8, &u8 (and possibly even &&u8!) in the iterator somehow, but how I'm not sure.

More generally, iterators seem designed for nice self contained things like iterating over a loop, or calling other functions like map on them. More explicit use like this seems very difficult. Am I barking up the wrong tree - the slice/offset approach works fine - is that actually the best way to do it?

Thanks!

The problem is that iterate_sign() requires an iterator over u8, but you've given it the iterator produced by &[u8]::iter(), which is std::slice::Iter that iterates over &u8. The simplest fix is probably to exploit the fact that u8 is Copy and insert a call to copied() to get an iterator over u8 values:

let mut iter = data.iter().copied().peekable();
4 Likes

I'm still learning rust myself but this (specific) example doesn't suit iterators well. I think iterators would be suitable if you needed to parse the signs of a series of numbers.

Something like:

numbers
    .iter()
    .map(|n| parse_sign(n))

I also think you can avoid using peekable() since you only end up calling next() anyway inside the match - could you just match next()?

Last comment :slight_smile: I would have iterate_sign() take a &[u8] directly rather than an iterator. You can iterate over a borrowed value as long as you don't mutate (or use an owned/consuming iterater function like into_iter).
I'd think of it as "I only want to look at this and check something, so I can borrow it".

Getting the hang of iter vs into_iter vs iter_mut and .cloned() will take a while... Read: took me a long time. I only recently had cloned() really click. An embarrassing amount of (*foo).clone().into_iter() before then! :stuck_out_tongue:

2 Likes

This is an example illustrating what @jameseb7 said:

#[derive(
    Debug,
    Clone, Copy,
    PartialEq, Eq,
)]
pub
enum Sign {
    Default,
    Positive,
    Negative,
}

pub
fn iterate_sign (
    iter: &mut std::iter::Peekable<impl Iterator<Item = u8>>,
) -> Sign
{
    match iter.peek() {
        | Some(&b'+') => {
            drop(iter.next());
            Sign::Positive
        }
        | Some(&b'-') => {
            drop(iter.next());
            Sign::Negative
        }
        | _ => Sign::Default,
    }
}

#[test]
fn test_sign ()
{
    let data: &[u8] = b"-1234 ";
    let mut iter = data.iter().copied().peekable();
    let val = iterate_sign(&mut iter);
    assert_eq!(val, Sign::Negative);
    assert_eq!(iter.peek(), Some(&b'1'));
}

If you feel like requiring that the caller remember to call .copied() on its iterator is cumbersome, you can make your function accept both iterators over &u8 and u8 by realising that these two types implement both the Borro<u8> trait, whereby you can call Borrow::borrow() on them to get a &u8:

use ::std::{
    borrow::Borrow,
    iter::{
        Peekable,
    },
};

#[derive(
    Debug,
    Clone, Copy,
    PartialEq, Eq,
)]
pub
enum Sign {
    Default,
    Positive,
    Negative,
}

pub
fn iterate_sign (
    iter: &mut Peekable<impl Iterator<Item = impl Borrow<u8>>>,
) -> Sign
{
    match iter.peek().map(Borrow::borrow) {
        | Some(&b'+') => {
            drop(iter.next());
            Sign::Positive
        }
        | Some(&b'-') => {
            drop(iter.next());
            Sign::Negative
        }
        | _ => Sign::Default,
    }
}

#[test]
fn test_sign ()
{
    let data: &[u8] = b"-1234 ";
    let mut iter = data.iter().peekable();
    let val = iterate_sign(&mut iter);
    assert_eq!(val, Sign::Negative);
    assert_eq!(iter.peek(), Some(&&b'1'));

    let mut iter_copied = data.iter().copied().peekable();
    let _ = iterate_sign(&mut iter_copied);
}
2 Likes

That's awesome - thanks all! Lots to get my head around :slight_smile:

One immediate question - is there any actual difference between:

fn iterate_sign (
    iter: &mut Peekable<impl Iterator<Item = u8>>,
) -> Sign

and

fn iterate_sign<I> (iter: &mut Peekable<I>) -> Sign
where I: Iterator<Item = u8>

or is it just a style thing? Think I prefer the first version but hadn't seen it before!

There is a small difference, but they're mostly the same.

The difference is that you can explicitly specify the type of iterator in the second version using the turbofish syntax, but this is not possible with the first.

// turbofish
let sign = iterate_sign::<MyIteratorType>(my_iter);

Here is a version without using copied. (link to the playground)

This is off-topic, but there is no reason to downgrade to iterator, when you already have an interface with random access like slice. Here is the design pattern used by nom, a great parser library, very similar to your original parse_sign but replacing usize with &[u8] (link to playground):

#[derive(Debug)]
enum ParserError {
    // add custom errors
}

type ParserResult<I, O, E> = Result<(I, O), E>;

fn parse_sign(data: &[u8]) -> ParserResult<&[u8], isize, ParserError> {
    match data[0] {
        b'+' => Ok((&data[1..], 1)),
        b'-' => Ok((&data[1..], -1)),
        _ => Ok((data, 1)),
    }
}

fn test_sign() {
    let input: &[u8] = b"-1234 ";
    let (rest, val) = parse_sign(input).unwrap();
    assert_eq!(val, -1);
    assert_eq!(rest, &input[1..]);
}
1 Like

They don't call .next() when the first byte is neither b'+' nor b'-', so they do need to .peek().


Not really off-topic, you are just trying to solve the XY problem. Regarding the suggestions, I think they deserve to be more nuanced: there are indeed disadvantages to making a function take a generic Iterator rather than a concrete slice, but there are advantages as well. Choosing between the two will then depend on the OP's use case:

  • taking a &'_ [u8] as input makes the reasoning behind the function "simpler", as you then get full access to the functions and methods operating on &'_ [u8], such as bulk cloning, indexing , sorting (if operating on a &'_ mut [u8]), random access / indexing, "bulk reading" (e.g., reading multiple bytes at once), etc (and it does not restrict the API "much", given than pretty much everything in computer science ends up becoming viewable as a slice of bytes).

    • Most of these behaviors can be packaged / abstracted away with multiple traits at once. For instance, you could do something as crazy as:

      fn parse_sign<Input> (input: Input) -> Sign
      where
          Input :
              IntoIterator<Item = Item> + // elem by elem iteration
              ToOwned<Owned = Vec<u8>> + // bulk cloning
              Index<usize, Output = u8> +
              Index<Range<usize>, Output = [u8]> +
              // Index<RangeFrom<usize>, ...
          ,
          Input::Item : Borrow<u8>,
          Vec<u8> : Borrow<Input>,
          // etc.
      

      ... but then you see how (ab)using trait bounds to reach the greatest level of genericity can become an anti-pattern.

    • In that case, you can still become easily more flexible (e.g., accepting both &'_ [u8] and Vec<u8> by taking an impl '_ + Borrow<[u8]>: Playground

  • On the other hand, taking an Iterator (or even better, an iterable, i.e., an impl IntoIterator) as input will let you work with lazily obtained sequences of bytes, either because they are absurdly large (potentially infinite) but can be computed using some mathematical property or relation between the successive elements (.e.g., the sequence of digits of pi), or because they are not fully available (e.g., reading from an internet socket or an absurdly huge file). This flexibility can be a quite nice feature in the API of the library, for instance, so as not to force users of the library to .collect() chunks of the iterator into buffers so as to be able to interact with a buffer-restricted API (such as nom's).

7 Likes

Yep, definitely not off-topic :slight_smile:

I spent a while trying to convert the code to iterators and whilst it was reasonably easy for the functions at the bottom level like the one I posted, higher up where the code needs to make decisions based on what it'd parsed previously it started to get a bit less obvious as to what I should be doing - to the point that the code started to look more complex, rather than less.

In my case the data always comes from a file, so I'll always have a slice of bytes available, not just an abstract iterator. So have decided to stick the existing slice/offset approach for now - will perhaps revisit at a later date when I've a bit more a clue as to what I'm doing with Rust :wink:

Thanks all - have learnt much, which is half the reason for doing it the first place!

Thanks for posting, I've got a lot to think about now too :slight_smile:

@Yandros Thinking in terms of impl Trait for parameters seems to be the "next level" of thinking about rust that consumers of libraries are very grateful for - without realising it :slight_smile:

Thoughts like these are brilliant. Slightly off topic again but is there any reference or reading that comes to mind that collates a lot of these generic design patterns?

I can imagine a page of the book going through a refactor of a library's API to swap "iterators" for "iterables", &T for Borrow<T> (if I'm understanding correctly) and discuss pros and cons.

1 Like

Why not just use the preexisting from_str method?