Iterate over &[1,2,3,4,5,6,7,8,9] to receive `&[1,4,7]` then `&[2,5,8]` then `&[3,6,9]`

I'm trying to iterate over a slice broken into chunks, and return a tuple with the nth element of each chunk.

Example:

&[1,2,3,4,5,6,7,8,9]

I'd like to break this into chunks of size 3, and then iterate over the results, returning these tuples, one on each next() call:

&[1,4,7], &[2,5,8], &[3,6,9]

So I made something like this:

use std::slice::Chunks;

pub struct MyIter<'a, T> {
    chunks: Vec<Chunks<'a, T>>,
    //This is so I have where to store the element that I want to return
    current_tuple: Vec<T>
    
}

impl<'a, T> MyIter<'a, T> {
    pub fn new(chunk_size: usize, slice: &'a[T]) -> MyIter<'a, T> {
        let mut chunks = Vec::new();
        let chunk_count = slice.len()/chunk_size;
        for i in 0..chunk_count {
            chunks.push(slice.chunks(chunk_size));
        }
        MyIter {
            chunks,
            current_tuple: Vec::new()
        }
    }
}

impl<'a, T: Copy> Iterator for MyIter<'a, T> {
    type Item = &'a [T];

    #[inline(always)]
    fn next(&mut self) -> Option<Self::Item> {
        for i in 0..self.chunks.len() {
            self.current_tuple[i] = self.chunks[i].next().unwrap()[0];
        }
        Some(self.current_tuple.as_slice())
    }
}

I quickly realized that returning a Vec<> every time is going to allocate and deallocate lots of stuff, so I made one that sits in the struct, so I always return a reference to its internal slice. I think this is the problem though. I don't want to return vector copies though. I also cannot return an array or tuple because I only know the chunk amount in runtime

I get:

error[E0495]: cannot infer an appropriate lifetime for autoref due to conflicting requirements
  --> src/lib.rs:32:33
   |
32 |         Some(self.current_tuple.as_slice())
   |                                 ^^^^^^^^
   |
note: first, the lifetime cannot outlive the anonymous lifetime defined on the method body at 28:13...
  --> src/lib.rs:28:13
   |
28 |     fn next(&mut self) -> Option<Self::Item> {
   |             ^^^^^^^^^
note: ...so that reference does not outlive borrowed content
  --> src/lib.rs:32:14
   |
32 |         Some(self.current_tuple.as_slice())
   |              ^^^^^^^^^^^^^^^^^^
note: but, the lifetime must be valid for the lifetime `'a` as defined on the impl at 24:6...
  --> src/lib.rs:24:6
   |
24 | impl<'a, T: Copy> Iterator for MyIter<'a, T> {
   |      ^^
note: ...so that the types are compatible
  --> src/lib.rs:28:46
   |
28 |       fn next(&mut self) -> Option<Self::Item> {
   |  ______________________________________________^
29 | |         for i in 0..self.chunks.len() {
30 | |             self.current_tuple[i] = self.chunks[i].next().unwrap()[0];
31 | |         }
32 | |         Some(self.current_tuple.as_slice())
33 | |     }
   | |_____^
   = note: expected `Iterator`
              found `Iterator`
1 Like

until lending iterators are stabilized you cannot return a reference from an Iterator I asked the same question before here is my post I hope it'll be helpful

Iterators cannot return references to something owned by the iterator itself in their .next() method. The trait Iterator supports .collecting() all the items and working with them concurrently – so a subsequent call to .next() must not invalidate an item returned by the previous call.

What could probably work is implementing something like streaming_iterator::StreamingIterator instead.

Depending on what you want to do with these chunks, it might also be sufficient if those aren’t actual slices but instead just iterators that extract the right items from the whole slice on-the-fly, or you could try working with ndarray to create an iterator of ArrayView1 (I think that should be possible if I understand your problem description correctly).

The ndarray example would use its array views to have views that can handle [1, 4, 7] etc even if they are spread out. You can for example make it as a 3 x 3 matrix and then take the columns of it. Here's an example.

4 Likes

I was looking for a solution without external crates.

Isn't there a simpler way? Remember that we always return a tuple with the same size so maybe a structure on the std lib that reutilizes its allocated memory, so I can always return a new one but the underlying memory will be the same, just filled with new data

Then make it return an array by value, rather than trying to return a reference?

then I have to know the size of the array in compile-time, which is a problem

If chunk size is a runtime value, then you are looking for chunks and chunks_exact slice methods.

I'm using chunks, but I need disjoint chunks. Take a look at the example, I should return one element from each disjoint chunk

E.g.

fn handle_chunk(c: &[i32]) {
    println!("{:?}", c);
}

////////////////////////////////////////////////////////////////////////////////

// with "impl Trait"
fn my_chunks<T: Copy>(slice: &[T], size: usize) -> impl Iterator<Item = impl Iterator<Item = &T>> {
    assert!(slice.len() % size == 0);
    let amount = slice.len() / size;
    (0..amount).map(move |start| slice[start..].iter().step_by(amount))
}

fn demonstration1() {
    let arr = [1, 2, 3, 4, 5, 6, 7, 8, 9];
    let mut vec = Vec::<i32>::with_capacity(3);
    for chunk in my_chunks(&arr, 3) {
        vec.clear();
        vec.extend(chunk);
        handle_chunk(&vec);
    }
}

////////////////////////////////////////////////////////////////////////////////

// custom iterator
fn my_chunks_2<T>(slice: &[T], size: usize) -> MyChunks<'_, T> {
    assert!(slice.len() % size == 0);
    let amount = slice.len() / size;
    MyChunks {
        slice,
        range: 0..amount
    }
}

struct MyChunks<'a, T> {
    slice: &'a [T],
    range: std::ops::Range<usize>,
}

impl<'a, T> Iterator for MyChunks<'a, T> {
    type Item = Chunk<'a, T>;
    fn next(&mut self) -> Option<Chunk<'a, T>> {
        self.range.next().map(|start| Chunk(self.slice[start..].iter().step_by(self.range.end)))
    }
}

struct Chunk<'a, T>(std::iter::StepBy<std::slice::Iter<'a, T>>);

impl<'a, T> Iterator for Chunk<'a, T> {
    type Item = &'a T;
    fn next(&mut self) -> Option<&'a T> {
        self.0.next()
    }
}

fn demonstration2() {
    let arr = [1, 2, 3, 4, 5, 6, 7, 8, 9];
    let mut vec = Vec::<i32>::with_capacity(3);
    for chunk in my_chunks_2(&arr, 3) {
        vec.clear();
        vec.extend(chunk);
        handle_chunk(&vec);
    }
}

////////////////////////////////////////////////////////////////////////////////

// streaming iterator (but without any trait)
fn my_chunks_3<T: Copy>(slice: &[T], size: usize) -> MyChunksStreaming<'_, T> {
    MyChunksStreaming {
        iter: my_chunks_2(slice, size),
        vec: Vec::with_capacity(size),
    }
}

struct MyChunksStreaming<'a, T> {
    iter: MyChunks<'a, T>,
    vec: Vec<T>,
}

impl<T: Copy> MyChunksStreaming<'_, T> {
    fn next(&mut self) -> Option<&[T]> {
        self.iter.next().map(|chunk| {
            self.vec.clear();
            self.vec.extend(chunk);
            self.vec.as_slice()
        })
    }
}

fn demonstration3() {
    let arr = [1, 2, 3, 4, 5, 6, 7, 8, 9];
    let mut chunks = my_chunks_3(&arr, 3);
    while let Some(chunk) = chunks.next() {
        handle_chunk(chunk);
    }
}

////////////////////////////////////////////////////////////////////////////////

fn main() {
    println!("demonstration1");
    demonstration1();
    println!("demonstration2");
    demonstration2();
    println!("demonstration3");
    demonstration3();
}

(playground)


On another read of your original post, I get the impression that I might've gotten the meaning of "size" wrong, your "size" might be the value in variables called "amount" in my code above. But that's easily changed if necessary.

1 Like

Ah. When you said "tuple" I took the Rust meaning, where it also has a compile-time size.

You can not return reference to a "disjoint chunk". &[T] has to point to a continuous memory, so you have to use an owned type into which elements will be copied. It can be either &[T; N] if you know chunk size at compile time or Box<[T]> (essentially Vec<T> without capacity) otherwise. Since you don't want the former, your best option is the latter.

Alternatively, you could introduce your own "proxy" type which would contain a reference to the original slice and chunk index. Then you can implement various traits like Index and IntoIterator on it to improve ergonomics, but it will not be a slice and you will not be able to deref such type into one.

You may write a type which reuses an inner heap allocated buffer, but, honestly, I don't think it's worth the trouble.

3 Likes

If you absolutely do not want to move or copy elements, @newpavlov 's suggestion of creating a proxy type is the only option.

What you want to do here basically amounts to a matrix transposition. The simplest way to do this would be to allocate a new array and copy or move the elements in the transposed order. You could even do it in-place without an extra allocation, but that is somewhat difficult in the general case. For a square matrix like in your example, it is simple though:

fn transpose_square<T>(mat: &mut [T], n: usize)
{
    assert!(mat.len() == n*n);
    for i in 1..n
    {
        for j in 0..i
        {
            mat.swap(i*n+j, j*n+i);
        }
    }
}

fn main()
{
    let mut a: [i32; 9] = [1, 2, 3, 4, 5, 6, 7, 8, 9];
    transpose_square(&mut a, 3);
    println!("{:?}", a);
}

You could then use chunks() on the transposed matrix.

Maybe you can make an array of options. You can take() them as needed. Of course your iterator will have to step through the None's and count the Some's to get the n'th Some.

why does it work for the non mutable version but not for the mutable one?

Take a look:

// custom iterator
fn my_chunks_2_mut<T>(slice: &mut [T], size: usize) -> MyChunksMut<'_, T> {
    assert!(slice.len() % size == 0);
    let amount = slice.len() / size;
    MyChunksMut {
        slice,
        range: 0..amount
    }
}

struct MyChunksMut<'a, T> {
    slice: &'a mut [T],
    range: std::ops::Range<usize>,
}

impl<'a, T> Iterator for MyChunksMut<'a, T> {
    type Item = Chunk<'a, T>;
    fn next(&mut self) -> Option<Chunk<'a, T>> {
        self.range.next().map(|start| Chunk(self.slice[start..].iter().step_by(self.range.end)))
    }
}

struct ChunkMut<'a, T>(std::iter::StepBy<std::slice::Iter<'a, T>>);

impl<'a, T> Iterator for ChunkMut<'a, T> {
    type Item = &'a mut T;
    fn next(&mut self) -> Option<&'a mut T> {
        self.0.next()
    }
}

Conceptually I know that I would be able to have 2 mutable references to the same data, but I don't understand why it fails, and if there's a way to fix this

For a mutable iterator, that’s a fairly complex access pattern of a slice. To keep an efficient approach without needing to allocate lots of vectors of references, and without using unsafe code yourself, you might want to consider using ndarray instead, if you really need a mutable iterator.

but why the error happens in the mutable one but not in the non-mutable one if both are exactly the same except for the mutable? What is the error source? I'd like to understand it better

The immutable iterator operates by creating multiple iterators over the whole slice in parallel and modify each to skip the right items, so that in the end everything is only returned once in total, by one of those multiple iterators. The first step of creating multiple iterators that – initially – each have access to the whole slice is noteworthy here; the approach of having giving “chunk” access to the whole slice initially allows for arbitrarily complex ways of splitting up the elements between the chunks; the compiler needs not be able to realize in any way shape or form, that these chunks are actually disjoint parts of the whole slice without overlap.

In the mutable setting, this approach fails, since it’s not possible to create multiple iterators with mutable access to the whole slice at the same time. The fact that the pattern of creating the chunks is – in the end – not actually giving shared mutable access to any element is nothing the compile could figure out by itself. This pattern is complex enough so that it’s only possible promise to the compiler that we’re doing the right thing, by using unsafe code, or to rely on the unsafe code of others that already implemented the same kind of pattern. Using ndarray would be the latter approach.

Alternatively, we could use existing API of iterating over mutable slices to create a bunch of &mut T references to each of the elements first, collect them into allocated, owned vectors (i.e. Vec<&mut T>s), and split those up as we wish; an approach that would involve additional overhead at runtime but could work with the existing safe APIs in the standard library.

Because mutable references can't be shared, but immutable ones can be. They're fundamentally different, so this isn't a surprise at all.

Compare, for example, (&a[0], &a[1]) vs (&mut a[0], &mut a[1]).

where exactly this sharing is prohibited by the compiler? Could you give the simplest example? I imagined that the compiler only looked at lifetimes, not mutable vs unmutable.