Is interesting this TryIntoIterator?


#1

I was thinking about the ability of creating a fixed-size array from iterator, and I come up with this solution (the main can be run with sanitizers to check if everything work fine):

extern crate rand;

use std::{mem, ptr, fmt::Debug, ops::{Deref, Drop}};
use rand::{thread_rng, Rng};

pub trait TryFromIterator<A>: Sized {
    type Error;
    fn try_from_iter<T>(iter: T) -> Result<Self, Self::Error>
    where
        T: Iterator<Item = A>;
}

#[derive(Debug)]
pub struct IterError;

macro_rules! impl_try_collect_array {
    ($($N:expr)+) => {
        $(
            impl<T: Sized> TryFromIterator<T> for [T; $N] {
                type Error = IterError;

                fn try_from_iter<I>(mut iter: I) -> Result<Self, Self::Error>
                    where
                        I: Iterator<Item = T>,
                {
                    let mut out: mem::ManuallyDrop<[T; $N]>;
                    let stop_index = unsafe {
                        out = std::mem::uninitialized();
                        let mut out_iter = out.iter_mut().enumerate();
                        loop {
                            if let Some((index, out_item)) = out_iter.next() {
                                if let Some(in_item) = iter.next() {
                                    ptr::copy_nonoverlapping(&in_item, out_item, 1);
                                    mem::forget(in_item);
                                } else {
                                    break Some(index);
                                }
                            } else {
                                break None;
                            }
                        }
                    };

                    if let Some(stop_index) = stop_index {
                        unsafe {
                            for item in out.iter_mut().take(stop_index) {
                                ptr::drop_in_place(item);
                            }
                        }

                        Err(IterError)
                    } else {
                        Ok(mem::ManuallyDrop::into_inner(out))
                    }
                }
            }
        )+
    }
}

// Used to inject `try_collect` inside `Iterator`
trait TryCollectExt<T: Sized>: Iterator + Sized {
    fn try_collect<F>(self) -> Result<F, F::Error>
    where
        F: TryFromIterator<Self::Item>;
}

impl<I, T> TryCollectExt<T> for I
where
    T: Sized,
    I: Iterator<Item = T> + Sized,
{
    fn try_collect<F>(self) -> Result<F, F::Error>
    where
        F: TryFromIterator<T>,
    {
        TryFromIterator::try_from_iter(self)
    }
}

impl_try_collect_array!(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32);

#[test]
fn test1() {
    let data: [u8; 10] = (0u8..).into_iter().try_collect().unwrap();
    assert_eq!(data, [0u8, 1, 2, 3, 4, 5, 6, 7, 8, 9]);
}

#[test]
#[should_panic]
fn test2() {
    let _data: [u8; 10] = (0u8..5).into_iter().try_collect().unwrap();
}

#[derive(Debug, Clone)]
struct NoisyVec<T: Debug>(Vec<T>);

impl<T: Debug> Deref for NoisyVec<T> {
    type Target = Vec<T>;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

impl<T: Debug> Drop for NoisyVec<T> {
    fn drop(&mut self) {
        println!("Dropping {:?}", self.0);
    }
}

#[derive(Debug, Clone)]
struct PanicVec<T>(Vec<T>);

impl<T> IntoIterator for PanicVec<T> {
    type Item = T;
    type IntoIter = PanicIterator<Vec<T>>;

    fn into_iter(self) -> Self::IntoIter {
        println!("Calling into_iter");
        PanicIterator(self.0.into_iter())
    }
}

struct PanicIterator<T: IntoIterator>(T::IntoIter);

impl<T: IntoIterator> Iterator for PanicIterator<T> {
    type Item = T::Item;

    fn next(&mut self) -> Option<Self::Item> {
        println!("Calling next");
        let mut rng = thread_rng();
        let rand_value = rng.gen_range(0, 10);
        if rand_value < 6 {
            self.0.next()
        } else {
            panic!("Whoops, PanicIterator is panicing!")
        }
    }
}

fn get_array_not_fault() -> Result<[NoisyVec<i32>; 7], IterError> {
    let vec = NoisyVec((0..10).into_iter().collect::<Vec<_>>());
    std::iter::repeat(vec).try_collect()
}

fn get_array_fault() -> Result<[NoisyVec<i32>; 7], IterError> {
    let vec = NoisyVec((0..10).into_iter().collect::<Vec<_>>());
    std::iter::repeat(vec).take(5).try_collect()
}

fn get_array_panicing() -> Result<[Vec<i32>; 7], IterError> {
    let vec = (0..10).into_iter().collect::<Vec<i32>>();
    let vec_of_vec = PanicVec(std::iter::repeat(vec).take(10).collect::<Vec<Vec<i32>>>());
    vec_of_vec.into_iter().try_collect()
}

fn main() {
    println!("Testing with not faulting test");
    if let Ok(data) = get_array_not_fault() {
        println!("Data obtained as expected: {:?}", data);
    } else {
        eprintln!("Unexpectedly cannot obtain data");
    }

    println!("Testing with faulting test");
    if let Ok(data) = get_array_fault() {
        eprintln!("Unexpected data obtained: {:?}", data);
    } else {
        println!("Cannot obtain data as expected");
    }

    println!("Testing using a (maybe) panicing iterator");
    if let Ok(data) = get_array_panicing() {
        eprintln!("Unexpectedly obtained data: {:?}", data);
    } else {
        eprintln!("Unexpectedly cannot obtain data (should have paniced)");
    }
}

It seems to work flawlessly (with address sanitizer I get an ODR violation, but I think it is another kind of problem), so it could be nice. But… is it worth? I mean, it is very useful to initialize a fixed array with zero cost and without unsafe code around, but I don’t see any other way this trait (and the try_collect function) could be used. At least, not until we have const generics…

What do you think?

EDIT: as suggested by ExpHP, the code has been slightly modified to use mem::ManuallyDrop and avoid crashes in case of panics.


#2
if let Some(in_item) = iter.next() {

I think this can panic and cause a drop of uninitialized data.


#3

If you mean that the iterator could be implemented as the call to next could cause a panic!(), you are right. But honestly I don’t know if it is possible to correctly handle these situations inside unsafe regions. I also don’t know what is really happening when panicing. Is the stack unwinded?

If you have suggestions, they are welcome.

Nevertheless, the code could be tweaked and optimized by people more experienced than me, my question is more about the usefulness if this trait would be, hypothetically, putted into the stdlib.


#4

Sorry, I should have been more helpful. My message was terse because I was interrupted. (and it will be again because I am on my phone, but I have more time to write)

On most platforms, a panic causes the stack to unwind, causing destructors to be called on all things in scope.
This is always a danger when calling trait methods in the presence of initialized data.

One useful tool for dealing with this is std::mem::ManuallyDrop. This will leak on panic, which is probably the best outcome you can hope for.

People suggest stuff like this all the time. I confess I haven’t followed these discussions enough to understand why nothing has made it into the standard lib yet.


#5

Very, very nice suggestion! I liked it very much – I was unaware if std::mem::ManuallyDrop!

At this point I want to ask you another thing: in case of panic! the implementation does not crash anymore. But at the same time it leaks when panic occurs. Every possible solution in my mind involves non-0 cost operation, which I do not consider valid in this case (after all, we are just filling an array with data, it must be without overhead).

The question is: how bad it is to leak on a panic in Rust? Are there any possible solutions to achieve both 0 overhead and perfect memory management on panic (and I am not seeing them because I am a n00b :smile:)?


#6

All I can say is it’s better than invoking UB.

Memory leaks are not considered UB in rust, so you can be sure there is no unsafe code somewhere you don’t know about that is relying on a value’s eventual destruction for memory safety. Some types like Vec's DrainFilter solve such safety issues by leaking even more data.

This page on the nomicon is worth a read.


Maybe I should have started with this suggestion, but have you tried having your macro generate something like this instead?

impl<T: Sized> TryFromIterator<T> for [T; 5] {
    type Error = IterError;
    fn try_from_iter<I>(mut iter: I) -> Result<Self, Self::Error>
    where
        I: Iterator<Item = T>,
    {
        // look ma, no unsafe!
        let out = [
            iter.next().ok_or(IterError)?,
            iter.next().ok_or(IterError)?,
            iter.next().ok_or(IterError)?,
            iter.next().ok_or(IterError)?,
            iter.next().ok_or(IterError)?,
        ];
        if let Some(_) = iter.next() {
            return Err(IterError);
        }
        Ok(out)
    }
}

This is a trickier macro to write, but I assure you it is 100% possible. I can give pointers if needed.


#7

Gosh! Awesome!

It is exactly the way it should be handled! It is a completely different point of view, and it is obviously the right one! I’ll edit the original question to use your solution – with credits! :blush:

You taught me a lesson today: try to search for a different perspective before using unsafe.
Thanks!