# Iterator that enables skipping over elements in a `Vec<u8>`

I have a two vectors:

• a is a `Vec<usize>` and each `usize` corresponds to the number of items that I need to read from
• b which is a `Vec<u8>` that contains the data

So here's what I want to do:

1. Get the next n characters to read from a
2. Read n characters from b
3. Increment reading position of b by 1 (i.e to skip a character)
4. Get the next n characters to read from a
... Until I reach the end of a

Though it is super simple to use a simple loop to read the data and return from it, but what I want to do, should look something like:

``````let extracted_vec: Vec<Vec<u8>> = a.into_iter().map(|size| {
vec![b.take(size).collect::<u8>()]
}).collect();
``````

The snag here is: I want to consume the elements of a and b at the same time, which, to say, is kind of like a conditional `zip()`.

Any suggestions?

Can you give an example of some input and it's associated output?

``````let a: Vec<usize> = vec![5, 6];
let b: Vec<u8> = vec![
72, 101, 108, 108, 111, 10, 87, 111, 114, 108, 100, 33
];
// Do that funny stuff
``````

The resulting vector would look like: `[[ 72, 101, 108, 108, 111],[87, 111, 114, 108, 100, 33]]`

My first idea would be to create a new struct that wraps both iterators and implements Iterator. Something like

``````struct SkipCharacter<A, B>(A, B);

impl<A, B> Iterator for SkipCharacter<A, B>
where
A: Iterator<Item = usize>,
B: Iterator<Item = u8>,
{
type Item = Vec<u8>;

fn next(&mut self) -> Option<Self::Item> {
todo!()
}
}
``````

I reached the same conclusion. But I couldn't find out how the iterator would be consuming `a` and `b` at the same time with some kind of an `IntoIterator` implementation

You can use `scan` to use the second iterator while processing the first:
(Playground)

``````fn main() {
let a: Vec<usize> = vec![5, 6];
let b: Vec<u8> = vec![72, 101, 108, 108, 111, 10, 87, 111, 114, 108, 100, 33];
// Do that funny stuff

let c: Vec<Vec<u8>> = a
.into_iter()
.scan(b.into_iter(), |b, count| {
Some((count, b.take(count + 1).collect()))
})
.map(|(count, mut v): (usize, Vec<u8>)| {
v.truncate(count);
v
})
.collect();

println!("{:?}", c);
}
``````
3 Likes

+1 for `.scan()`. I think is a very plausible solution, but I'm thinking about the time complexity.
since `.truncate()` would relocate to the vector. However, this looks pretty neat

Truncating a vector doesnâ€™t affect its allocation. It should be an O(1) operation of writing to the internal length field followed by calling destructors for the dropped items. In this case, thatâ€™s one integer at most.

1 Like

You're right:

``````unsafe {
if len > self.len {
return;
}
let remaining_len = self.len - len;
// ah ha
self.len = len;
ptr::drop_in_place(s);
}
``````

Perfect!

On the other hand, maybe this is stylistically preferrable:

``````    let c: Vec<Vec<u8>> = a
.into_iter()
.scan(b.into_iter(), |b, count| {
let v = b.take(count).collect();
let _ = b.next();
Some(v)
})
.collect();
``````
3 Likes

You may also be interested in returning a `Vec<&[u8]>` where the inner slices borrow from `b`:

``````pub fn funny_stuff_slice<'b>(a: &[usize], b: &'b [u8]) -> Vec<&'b [u8]> {
a.into_iter()
.scan(0usize, |pos, count| {
let s = &b[*pos..*pos + count];
*pos += count + 1;
Some(s)
})
.collect()
}
``````

Here is a comparison side by side with two tests.

5 Likes

You can also write an iterator that from an iterator of `usize`s and a slice of data produces subslices of the data. Morally,

``````fn new(counts: impl Iterator<Item=usize>, data: &[T]) -> impl Iterator<Item=&[T]>
``````

See it here on the Rust Playground.

This completely avoids any allocation, unless of course you `collect` the generated iterator.

### Edit

This raises a question however. In line 13, I have to write the trait bound

``````IntoCountIter: IntoIterator<IntoIter = CountIter, Item = CountIter::Item>
``````

I would expect

``````IntoCountIter: IntoIterator<IntoIter = CountIter>
``````

to be sufficient, because the constraint on `Item` is already an explicit requirement in the definition of `IntoIterator`. Why is it necessary to repeat it here? (I can move this to a new question, if more appropriate.)

1 Like

@FedericoStra Sometimes Rust requires you to repeat bounds.

1 Like

That's fair. Rust is already super intelligent; I don't pretend it to do even more. I just wanted to make sure that I wasn't missing something. My understanding is that we cannot have `IntoCountIter: IntoIterator<IntoIter = CountIter>` without having also `IntoCountIter: IntoIterator<IntoIter = CountIter, Item = CountIter::Item>`, because the constraint in the definition of `IntoIterator` allows us to deduce the second from the first. I hope this is correct.

In my experience, Rust is extremely conservative in the bounds that it will automatically infer for generic parameters. If `CountIter` was a concrete type instead of generic, the repeated bound would probably not have been necessary.

1 Like

What is interesting is that the bound can be expressed in two ways and it works in both cases: link.
We still have to impose it some way or another.

The way the compiler seems to work is that it infers everything it can from what you explicitly declare on the `impl`, and then checks those conclusions against the requirements stated elsewhere. If it canâ€™t prove one of those requirements is true from your statements, itâ€™s a compile error.

Then, these explicit statements become the requirements that other code must prove before it can rely on your `impl`.

In your example, thatâ€™s `Trait::In == Trait::Out::Inner`.

(I havenâ€™t looked at the compiler internals; this is just my behavioral observations)

If you want to avoid the cognitive burden of learning all the adaptors that `Iterator` has, I recommend learning to use the full power of closures instead, since it generalizes to many other constructs.

``````let extracted_vec: Vec<Vec<u8>> = a.iter().copied().map({
let mut b = b.into_iter();
move |size| { /* captures the same thing that `scan` did */
let b = b.by_ref(); /* explicit by `&mut` access within the closure's body */
let elems = b.take(size).collect::<Vec<_>>();
drop(b.next());
elems
}
}).collect();
``````
3 Likes

Note that the title of this thread is inaccurate. The problem as posed in detail, and all the solutions, are for the topic "Iterator that enables skipping over `u8`s in a `Vec`". `String`s are UTF-8, where each character is one to three bytes. Neither the problem as posed nor the solutions work for UTF-8 characters.

4 Likes

Corrected