Iterator that enables skipping over elements in a `Vec<u8>`

I have a two vectors:

  • a is a Vec<usize> and each usize corresponds to the number of items that I need to read from
  • b which is a Vec<u8> that contains the data

So here's what I want to do:

  1. Get the next n characters to read from a
  2. Read n characters from b
  3. Increment reading position of b by 1 (i.e to skip a character)
  4. Get the next n characters to read from a
    ... Until I reach the end of a

Though it is super simple to use a simple loop to read the data and return from it, but what I want to do, should look something like:

let extracted_vec: Vec<Vec<u8>> = a.into_iter().map(|size| {
    vec![b.take(size).collect::<u8>()]
}).collect();

The snag here is: I want to consume the elements of a and b at the same time, which, to say, is kind of like a conditional zip().

Any suggestions?

Can you give an example of some input and it's associated output?

let a: Vec<usize> = vec![5, 6];
let b: Vec<u8> = vec![
    72, 101, 108, 108, 111, 10, 87, 111, 114, 108, 100, 33
];
// Do that funny stuff

The resulting vector would look like: [[ 72, 101, 108, 108, 111],[87, 111, 114, 108, 100, 33]]

My first idea would be to create a new struct that wraps both iterators and implements Iterator. Something like

struct SkipCharacter<A, B>(A, B);

impl<A, B> Iterator for SkipCharacter<A, B>
where
    A: Iterator<Item = usize>,
    B: Iterator<Item = u8>,
{
    type Item = Vec<u8>;

    fn next(&mut self) -> Option<Self::Item> {
        todo!()
    }
}

I reached the same conclusion. But I couldn't find out how the iterator would be consuming a and b at the same time with some kind of an IntoIterator implementation

You can use scan to use the second iterator while processing the first:
(Playground)

fn main() {
    let a: Vec<usize> = vec![5, 6];
    let b: Vec<u8> = vec![72, 101, 108, 108, 111, 10, 87, 111, 114, 108, 100, 33];
    // Do that funny stuff

    let c: Vec<Vec<u8>> = a
        .into_iter()
        .scan(b.into_iter(), |b, count| {
            Some((count, b.take(count + 1).collect()))
        })
        .map(|(count, mut v): (usize, Vec<u8>)| {
            v.truncate(count);
            v
        })
        .collect();

    println!("{:?}", c);
}
3 Likes

+1 for .scan(). I think is a very plausible solution, but I'm thinking about the time complexity.
since .truncate() would relocate to the vector. However, this looks pretty neat

Truncating a vector doesn’t affect its allocation. It should be an O(1) operation of writing to the internal length field followed by calling destructors for the dropped items. In this case, that’s one integer at most.

1 Like

You're right:

unsafe {
            if len > self.len {
                return;
            }
            let remaining_len = self.len - len;
            // ah ha
            let s = ptr::slice_from_raw_parts_mut(self.as_mut_ptr().add(len), remaining_len);
            self.len = len;
            ptr::drop_in_place(s);
        }

Perfect! :star:

On the other hand, maybe this is stylistically preferrable:

    let c: Vec<Vec<u8>> = a
        .into_iter()
        .scan(b.into_iter(), |b, count| {
            let v = b.take(count).collect();
            let _ = b.next();
            Some(v)
        })
        .collect();
3 Likes

You may also be interested in returning a Vec<&[u8]> where the inner slices borrow from b:

pub fn funny_stuff_slice<'b>(a: &[usize], b: &'b [u8]) -> Vec<&'b [u8]> {
    a.into_iter()
        .scan(0usize, |pos, count| {
            let s = &b[*pos..*pos + count];
            *pos += count + 1;
            Some(s)
        })
        .collect()
}

Here is a comparison side by side with two tests.

5 Likes

You can also write an iterator that from an iterator of usizes and a slice of data produces subslices of the data. Morally,

fn new(counts: impl Iterator<Item=usize>, data: &[T]) -> impl Iterator<Item=&[T]>

See it here on the Rust Playground.

This completely avoids any allocation, unless of course you collect the generated iterator.

Edit

This raises a question however. In line 13, I have to write the trait bound

IntoCountIter: IntoIterator<IntoIter = CountIter, Item = CountIter::Item>

I would expect

IntoCountIter: IntoIterator<IntoIter = CountIter>

to be sufficient, because the constraint on Item is already an explicit requirement in the definition of IntoIterator. Why is it necessary to repeat it here? (I can move this to a new question, if more appropriate.)

1 Like

@FedericoStra Sometimes Rust requires you to repeat bounds.

1 Like

That's fair. Rust is already super intelligent; I don't pretend it to do even more. I just wanted to make sure that I wasn't missing something. My understanding is that we cannot have IntoCountIter: IntoIterator<IntoIter = CountIter> without having also IntoCountIter: IntoIterator<IntoIter = CountIter, Item = CountIter::Item>, because the constraint in the definition of IntoIterator allows us to deduce the second from the first. I hope this is correct.

In my experience, Rust is extremely conservative in the bounds that it will automatically infer for generic parameters. If CountIter was a concrete type instead of generic, the repeated bound would probably not have been necessary.

1 Like

What is interesting is that the bound can be expressed in two ways and it works in both cases: link.
We still have to impose it some way or another.

The way the compiler seems to work is that it infers everything it can from what you explicitly declare on the impl, and then checks those conclusions against the requirements stated elsewhere. If it can’t prove one of those requirements is true from your statements, it’s a compile error.

Then, these explicit statements become the requirements that other code must prove before it can rely on your impl.

In your example, that’s Trait::In == Trait::Out::Inner.

(I haven’t looked at the compiler internals; this is just my behavioral observations)

If you want to avoid the cognitive burden of learning all the adaptors that Iterator has, I recommend learning to use the full power of closures instead, since it generalizes to many other constructs.

let extracted_vec: Vec<Vec<u8>> = a.iter().copied().map({
    let mut b = b.into_iter();
    move |size| { /* captures the same thing that `scan` did */
        let b = b.by_ref(); /* explicit by `&mut` access within the closure's body */
        let elems = b.take(size).collect::<Vec<_>>();
        drop(b.next());
        elems
    }
}).collect();
3 Likes

Note that the title of this thread is inaccurate. The problem as posed in detail, and all the solutions, are for the topic "Iterator that enables skipping over u8s in a Vec". Strings are UTF-8, where each character is one to three bytes. Neither the problem as posed nor the solutions work for UTF-8 characters.

4 Likes

Corrected