Collect into an array

Hey everyeone,

a reddit user complained about collecting an iterator into an array is very cumbersome. I agreed.

That's why I wrote some little code, like this:

#![feature(maybe_uninit_array_assume_init)]
#![feature(maybe_uninit_uninit_array)]

use std::mem::MaybeUninit;

#[derive(Debug, PartialEq)]
enum Code {
    TooFew,
    TooMany,
}

trait CollectArray: Iterator {
    fn try_collect<const N: usize>(self) -> Result<[Self::Item; N], Code>
    where
        Self: Sized,
    {
        let mut result = MaybeUninit::uninit_array();
        let mut max_idx = 0;
        for i in self {
            if max_idx >= N {
                return Err(Code::TooMany);
            }
            result[max_idx] = MaybeUninit::new(i);
            max_idx += 1;
        }
        if max_idx < N {
            return Err(Code::TooFew);
        }

        Ok(unsafe { MaybeUninit::array_assume_init(result) })
    }
}

impl<I: Iterator> CollectArray for I {}

fn main() {
    let a = std::iter::repeat(3).try_collect::<5>();
    assert_eq!(a, Err(Code::TooMany));
    let b = std::iter::repeat(3).take(4).try_collect::<5>();
    assert_eq!(b, Err(Code::TooFew));
    let c = std::iter::repeat(3).take(5).try_collect::<5>();
    assert_eq!(c, Ok([3, 3, 3, 3, 3]));
}

(Playground)

Miri says it's safe.
I see at least one problem, where the elements won't get dropped, if an error occurs.

Can somebody help me with this?
Maybe I'd like to publish a crate and my big dream :unicorn: is, that this is in std at some time.

Feedback is more than welcome.

Until something in std is available/stabilized, please recommend going through Vec instead of unsafe! It's pretty much trivial:

#![feature(const_generics)]

use std::convert::TryInto;

fn collect_array<T, const N: usize>(iter: impl IntoIterator<Item=T>) -> Result<[T; N], Vec<T>> {
    iter.into_iter().collect::<Vec<T>>().try_into()
}

Your code is definitely safer than mine, but yours requires allocation, mine doesn't, e.g. mine does work in a #![no_std] environment.

Also mine does work with an endless iterator (like the first example), yours doensn't.

Also your example doesn't work with the turbofish syntax :frowning_face:

error[E0632]: cannot provide explicit generic arguments when `impl Trait` is used in argument position
  --> src/main.rs:49:29
   |
49 |     let d = collect_array::<usize, 5>(std::iter::repeat(3));
   |                             ^^^^^  ^ explicit generic argument not allowed
   |                             |
   |                             explicit generic argument not allowed

but that could be circumvented by introducing another type I.

For extension traits, you should either have the trait and the blanket implementation both be ?Sized or you should have both be Sized; currently, the trait is ?Sized but the blanket impl is Sized which means that it's possible for users to implement the trait manually on an unsized iterator type, which shouldn't be possible. So either make the trait a subtrait of Sized or add ?Sized to the bound in the blanket impl.

Also, you don't need any unstable features for this. You can create an uninitialized [MaybeUninit<T>; N] by assume_initing an uninitialized MaybeUninit<[MaybeUninit<T>; N]>, and you can replace array_assume_init with a simple transmute.

1 Like

Then you want to collect into an

(It's not cumbersome, it's just not in std -- yet.)

4 Likes

I'm aware. But the desire for micro-optimizations should not override safety. Of course, an implementation in std would be expected to avoid allocations; I was trying to provide a simple workaround until that happens, because it's far better than everyone and their grandmother writing unsafe.

That's a non-argument. I wrote the above using impl syntax for brevity, because I'm on my phone. It's not required; it's trivial to rewrite it using generic syntax.

That's a good point with sized. I'll add that.

For the unstable features, yeah... It was easier to use them instead of reimplementing them. But if I would go for a create, I would maybe do that :slight_smile:


I know that arrayvec exists, but I don't want to introduce a new type here. I just want a plain array.


I wouldn't call that micro-optimization. It's just avoiding allocations, because you can't allocate for example. It was designed with not allocating into a vec in mind. I'm aware of the TryFrom implementation, but as said, Vec was never an option.

That's why I wrote

:wink:

I'm totally aware of the pain that comes with unsafe. And trust me, I would like to avoid that completly. I could use T: Default instead of MaybeUninit, and the compiler should be smart enough to avoid the default instiation completly.
But, not every T implements Default, that's why the "workaround" with MaybeUninit and that's why I asked for guidance, because it's a somewhat complicated field to get things right, e.g. the drop problem I mentioned in my first post.

I wouldn't say that my grandmom should write unsafe code, because I can't do it right either. But the more people look at stuff, the better it gets (hopefully). So here I am (:

Just realized you can create an uninit array without unsafe or unstable features like so:

struct UninitAssoc<T>(T);
impl<T> UninitAssoc<T> {
    const UNINIT: MaybeUninit<T> = MaybeUninit::uninit();
}
let mut result = [UninitAssoc::UNINIT; N];

You might want to use that instead :smiley:.

2 Likes

That's why I mention arrayvec -- which has had many eyes on it already -- since any code doing this is essentially recreating ArrayVec::push.

There will likely be a const-generics-using ArrayVec in core fairly soon, but until then I'd implement try_collect in terms of ArrayVec, maybe using something like this:

use arrayvec::{ArrayVec, Array}; // 0.5.2

#[derive(Debug, PartialEq)]
enum Code {
    TooFew,
    TooMany,
}

trait CollectArray: Iterator {
    fn try_collect<A: Array<Item = Self::Item>>(mut self) -> Result<A, Code>
    where
        Self: Sized,
    {
        let av: ArrayVec<A> = self.by_ref().collect();
        match av.into_inner() {
            Ok(a) => if self.next().is_some() {
                Err(Code::TooMany)
            } else {
                Ok(a)
            },
            Err(_) => Err(Code::TooFew),
        }
    }
}

impl<I: Iterator> CollectArray for I {
}

fn main() {
    let a: Result<[_; 5], _> = std::iter::repeat(3).try_collect();
    assert_eq!(a, Err(Code::TooMany));
    let b: Result<[_; 5], _> = std::iter::repeat(3).take(4).try_collect();
    assert_eq!(b, Err(Code::TooFew));
    let c: Result<[_; 5], _> = std::iter::repeat(3).take(5).try_collect();
    assert_eq!(c, Ok([3, 3, 3, 3, 3]));
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e93d60451d9ad66f91192df626565171

1 Like

Why the by_ref? You don't gain anything from it but only lose the benefits of internal iteration.

Because he's using self again in the if self.next().is_some() check, so it shouldn't be consumed (collect() would move the iterator by-value).

Specifically what do you mean by this?

Oh right, I didn't see that

The implementation of Iterator for &mut impl Iterator doesn't override fold nor try_fold, preventing any internal iteration in the implementation of FromIterator.

Edit: anyway looking at the source code of ArrayVec it doesn't use internal iteration, although I'm pretty sure it could, so nothing is really lost.

The array-init and stackvec crates have methods for collecting from iterators into fixed-size stack arrays.

(Warning: stackvec might be unsound because it predates the MaybeUnit type, and uses mem::uninitialized instead.)

1 Like

However, all Iterator::by_ref() really does is return self: &mut Self. And since try_fold() already takes &mut self, the call iter.by_ref().try_fold() will actually resolve to iter.try_fold(), as demonstrated here. And while this does not apply to fold(), it should be trivial to fix.

That works only because that part of the code knows you have an &mut Foo but other parts of the code (e.g. the implementations of FromIterator) don't know they have a mutable reference. Your same example, but calling .collect::<Vec<_>>(), will print Foo::next() (playground link)

That's exactly what I was saying;

and, by extension, to any Iterator method that takes self by value instead of &mut self.

The state you need during collection (to handle panic correctly) is equivalent to arrayvec. When it's done, that equivalent state can be converted to an array.

2 Likes