Should there be a default best-effort implemtation of FromIterator for arrays?

I found some earlier topics talking about the lack of .collect::<[T; N]>() on iterators.

I feel like it should be able to be implemented in a sane way for all types that have a Default and are Clone. If the iterator yields less than N, the rest will be filled by cloning the default. If it yields more than N, the rest will simply be discarded.

Earlier posters suggested panicking which I really don't see the need for. Also, a TryFromIterator as has been talked about before could ofc still exist and maybe do something different on arrays in case it is desirable to detect when it fails, but for my needs I fill like I have a bunch of iterators whose size I can guarantee and it would be awesome to not have to Heap allocate a Vec for them.

Here is my Dummy array implementation

/// Fake array cuz mere rust user mortals can't implement for [T; N] directly
pub struct Arr<T, const N: usize>(pub [T; N]);

impl<T: Default + Clone, const N: usize> FromIterator<T> for Arr<T, N> {
    fn from_iter<I: IntoIterator<Item = T>>(iter: I) -> Self {
        let mut arr: [T; N] = unsafe { std::mem::MaybeUninit::uninit().assume_init() };

        let mut max = 0;
        for (i, value) in iter.into_iter().enumerate().take(N) {
            arr[i] = value;
            max = i;
        }

        let top = max.saturating_add(1);
        if top < N {
            let fill = T::default();
            for i in top..N {
                arr[i] = fill.clone()
            }
        }

        Arr(arr)
    }
}

Why is this not desirable?

I'd prefer an implementation for ArrayResult<[T; N]>. There's already precedent for faillible FromIterator implementations. The issue with that would be that I'd have to know which result or error type to suggest to collect. And that's not really discoverable.

1 Like

For arbitrary T this is insta-UB, which would be obvious if you check the case of T = Box<i32>: in this case, arr[i] = value will attempt to free the memory using the uninitialized pointer.

Other consideration is, do we really want to silently fill the array with Defaults (and do we actually want to restrict this implementation with Default bound) and do we really want to silently discard the excess elements - accent on "silently" in both cases.

7 Likes

The Default case can already be written straightforwardly as

let array: [T; N] = core::array::from_fn(|_| iter.next().unwrap_or_default())

IMO the real solution will be to have ArrayVec in std, and have it implement FromIterator, or maybe an Iterator::next_array_vec() method or similar.

7 Likes

I see that my suggested implementation is naive as you have pointed out with your example, but that UB could be avoided with a serious implementation and/or more restrictions on T.

While I do see the point that this both silently discards the rest of the iterator and silently fills the rest of the array, I also don't really see how that would not be the expected result from collecting into a fixed size array from an arbitrarily sized iterator.

Where the concern here might come from is whether such functionality should be in the standard library or elsewhere, but given the documentation for both Iterator::collect() and FromIter gives no such guarantees that the collection created will contain all the elements from the iterator, nor that it won't contain items not from the iterator, I don't see why this couldn't be the case for collecting to an array.

It is undeniable in my use-cases that collecting into a fixed size array would be both convenient and in-line with rust's ability to be memory efficient. Given the option of not being able to collect into an array neatly or the option to collect into one but knowing the behaviour, I'd personally want the latter in this case at least

Why would it not be expected for an iterator over T: Clone + Default into an array [T; N] to fill the remainer with Default::default() and/or discard items from an iterator of a number of items greater than N? I suppose the implementation could perhaps modulo fill the array in case it overflows so that the whole iterator is actually run, but either way could be the defined sane behavior I feel without needing to panic.

Good tip with core::array::from_fn! ArrayVec and other in-line containers in std would also be very nice yes.

2 Likes

Would be acceptable, though I feel the unstable Iterator::try_collect seems even more promising

1 Like

It's not just important to do a reasonable thing given the involved types; it's also important to check that the types are reasonable. For example, suppose we had the FromIterator implementation you propose, and someone writes:

let tmp = if special_case {
    []
} else {
    something.iter().map(...).collect()
};
something.clear();
for item in tmp {
    ...
}

Then, since [] is of type [T; 0], the else branch will be inferred to also be of type [T; 0] and the collect() would always discard all elements — even though that isn't what the author meant. They probably should have written vec![] in the first branch of the if, but this is an easy mistake to make.

Certainly collect() is sometimes lossy — for example, collecting into a String concatenates all the items and forgets the boundaries between them — but generally only in ways that make sense for the type. [T; N] doesn't mean "collection of which we only care about N elements”; it means “collection which always contains N elements”, and it should always be an error to have fewer or greater than N elements unless you opt in to truncation/extension, to catch bugs.

5 Likes

Inferring types that are not what the user intended is ofc always an issue but is perhaps out of scope for what I am proposing. As the documentation for Iterator::collect points out, collect is such a general and overloaded function that it is the one common case where one really has to be explicit about the types.

(Furthermore, if the compiler determines you are iterating over [T; 0] specifically I feel like that should emit a warning either way given that it is useless and is most likely unintentional)

I do see your point that having full control and being explicit is ofc the best.

I would not at all be against arrays having specific functions that do just this, like a std::array::from_iter_truncated() or the likes.

It is however still interesting just to hear the thoughts generally here about sane defaults and wether I am out of the waters here thinking that collecting into a [T; N] could have a pretty well-defined and sensical meaning

Part of me wonders if collect specifically is just the wrong abstraction for building arrays, because of the too-short and too-long error cases that aren't obvious how they should be handled. (There are certainly reasonable options, but I think reasonable people can easily disagree on which way is best.)

Thus it seems plausible to me that we might end up in a world where the answer will never be collecting into an array, but rather collecting into an arrayvec or using something like the currently unstable Iterator::next_chunk that can be clearer about how they behave in different situations.

5 Likes

No truncation, no extension with default values: To me this default behavior would be expected and unsurprising. Truncation might drop data you were expecting to keep. And extension with default values requires the Default bound on the value type.

This doesn't explain how you would like to see those bad cases be handled. With a panic? A result? In that case what error should it return?

I would expect a panic if the sizes don't match. It's a limited use case, but a useful one I think.

That's the problem then. Generally collect is assumed to not panic, at least not for such a user error, which would make this behaviour also unexpected and surprising. It also doesn't help that there's no sign that you should ensure the iterator has the right size.

Moreover this doesn't solve the problem where you actually have less or more elements.

2 Likes

I guess we've just proved that @scottmcm was right :slight_smile:

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.