When will the feature `impl_trait_in_assoc_type` be stable?

I thought this feature is important for someone like me who is manipulating data a lot.
Without this feature, when I need to do some simple math operators on data, I have to:

let v = vec![1f32; 100_000_000];
let k1 = v.iter().zip(v.iter()).map(|(x, y)| x + y);
let k2 = k1.zip(v.iter()).map(|(x, y)| x - y);
let k3 = k2.zip(v.iter()).map(|(x, y)| x * y / 10. + 2. );
let k4 = k3.zip(v.iter()).map(|(x, y)| x/ y);
let k5 = k4.collect::<Vec<f32>>();

But with that feature, I can do it in a more simple way(after implementing these traits):

let my_vec = v.clone().to_myvec();
let k1 = &my_vec + &my_vec;
let k2 = k1 - &my_vec;
let k3 = k2 * &my_vec / 10. + 2.;
let k4 = k3 / &my_vec;
let k5 = k4.to_vec();

I found this tracking issue post four months ago, at that time I thought this feature would be stable in nearly future. But it never comes out, is there someone has the sense of when the feature will be stable?

We don't know.

Anyway, if you want to do vectorized/array arithmetic, use a dedicated library such as ndarray.

1 Like

I tried narray, but it is slower than using bare Vec, for example:

use ndarray::arr1;

fn main() {
    let v = vec![1f32; 100_000_000];
    let a = arr1(&v);

    let time_start = std::time::Instant::now();
    let k1 = v.iter().zip(v.iter()).map(|(x, y)| x + y);
    let k2 = k1.zip(v.iter()).map(|(x, y)| x - y);
    let res = k2.collect::<Vec<f32>>();
    println!("{:?}, {:?}", time_start.elapsed(), res.len());

    let time_start = std::time::Instant::now();
    let k1 = &a + &a;
    let res = k1 - &a;
    println!("{:?}, {:?}", time_start.elapsed(), res.len());
}
//output
//87.084925ms, 100000000
//136.38149ms, 100000000

This is the playground.

Of course you are not measuring the same thing. Your measurement is biased in favor of your own approach because you are only doing one pass and collecting the result at the end, whereas you are traversing the arrays many times and allocating further intermediate arrays when working with ndarray. Consequently, that's not a fair comparison.

If you do the right thing, then suddenly ndarray is on par with, or faster than, the iterators.

By the way, I don't really get why you need impl Traitin associated type position for your original impls. You should just make the wrappers generic over the function type, much like iterator adapters do.

1 Like

I thought

ndarray::azip!((res in &mut res, &x in &a, &y in &a, &z in &a) *res = x + y - z);

is more complex than

izip!(v.iter(), v.iter(), v.iter()).map(|(x, y, z)| x + y -y)

and

&my_vec + &my_vec - &my_vec

Do you think that the reason narray doesn't apply the last arithmetic style, is that impl_trait_in_accoc_type is not stable?

This method didn't come to my mind, I'm going try it now.

Could you give me clue about how to write this method down?
After I get to this point:

use std::{iter::{Map, Zip}, slice::Iter, ops::{Add, Sub, Mul, Div}};

pub struct MyVec(pub Vec<f32>);

pub struct IterCont<T: Iterator<Item = f32>, F>(pub T, F);


impl<'a, F: Fn((&'a f32, &'a f32)) -> f32> Add<&'a MyVec> for &'a MyVec {
    type Output = IterCont<Map<Zip<Iter<'a, f32>, Iter<'a, f32>>, F>, F>;
    fn add(self, rhs: MyVec) -> Self::Output {
        let c_f = |(x, y)| x + y;
        let map = self.0.iter().zip(rhs.0.iter()).map(c_f);
        IterCont(map, c_f)
    }
}

then this error show up:

error[E0207]: the type parameter `F` is not constrained by the impl trait, self type, or predicates
 --> mytest/src/main.rs:8:10
  |
8 | impl<'a, F: Fn((&'a f32, &'a f32)) -> f32> Add<&'a MyVec> for &'a MyVec {
  |          ^ unconstrained type parameter

It really isn't. Maybe more tokens or bytes of source text, but not any harder conceptually. The difference is insignificant in any case.

Yeah, it's more complex than the operator overloaded version. But again, you are comparing apples to oranges in that case. You could implement the same thing using iterators by collecting into intermediate vectors, and it wouldn't be faster.

If you specifically want a lazily evaluated computation graph, you can have that, too. ndarray is not that. You can easily roll your own using operator overloading. You don't need impl Trait for that, just represent operators by distinct types and make them generic over operands. Or look for libraries that implement the pattern – I don't know any off the top of my head.

I'll try to demonstrate the lazy graph approach once I get on my computer.

1 Like

Also, can we see a version of the motivating operation sequence that isn't trivially reduced to a single pass map?

let k5 = v.iter().zip(v.iter()).map(|(x, y)| {
  ((x + y - y) * y / 10. + 2.) / y
}).collect::<Vec<_>>();

Maybe I've get what you are meaning:

use std::{iter::{Map, Zip}, slice::Iter, ops::{Add, Sub, Mul, Div}};

pub struct MyVec(pub Vec<f32>);

pub struct IterCont<T: Iterator<Item = f32>>(pub T);

impl<T: Iterator<Item = f32>> IterCont<T> {
    fn to_vec(self) -> Vec<f32> {
        self.0.collect::<Vec<f32>>()
    }
}

#[derive(Clone)]
enum Ops {
    Add,
    Sub,
}

impl Ops {
    fn ops_on(&self, data: (&f32, &f32)) -> f32 {
        match self {
            Ops::Add => data.0 + data.1,
            Ops::Sub => data.0 - data.1,
        }
    }
}

#[derive(Clone)]
pub struct MyMap<I> {
    pub(crate) iter: I,
    f: Ops,
}

impl<'a, I: Iterator<Item = (&'a f32, &'a f32)>> Iterator for MyMap<I> {
    type Item = f32;
    fn next(&mut self) -> Option<Self::Item> {
        if let Some(data) = self.iter.next() {
            Some(self.f.ops_on(data))
        } else {
            None
        }
    }
}

type ZipIter<'a> = Zip<Iter<'a, f32>, Iter<'a, f32>>;

impl<'a> Add<&'a MyVec> for &'a MyVec {
    type Output = IterCont<MyMap<ZipIter<'a>>>;
    fn add(self, rhs: &'a MyVec) -> Self::Output {
        let iter = self.0.iter().zip(rhs.0.iter());
        let my_map = MyMap{ iter, f: Ops::Add };
        IterCont(my_map)
    }
}


fn main() {
    let my_vec = MyVec(vec![1f32; 1000]);
    let res = &my_vec + &my_vec;
    let bb = res.to_vec();
    println!("{:?}", &bb[..10]);
}

It's really complicated.

For example, I need to get the middle calculation result:

fn get_two_vecs() -> (Vec<f32>, Vec<f32>) {
    let v = vec![1f32; 1000];
    let k1 = v.iter().zip(v.iter()).map(|(x, y)| x + y).collect::<Vec<f32>>();
    let k2 = k1.iter().zip(v.iter()).map(|(x, y)| x / y).collect::<Vec<f32>>();
    (k1, k2)
}
1 Like

Not really. You are just re-implementing zip and map from the standard library with that piece of code. I meant that you should represent the different mathematical operations using different types, and create a computational graph from those. Finally, you can apply the whole thing to an iterator, like this.

I changed the enum method to trait method, then the speed is slower again:

use std::{iter::{Map, Zip}, slice::Iter, ops::{Add, Sub, Mul, Div}};

pub struct MyVec(pub Vec<f32>);

pub struct MyAdd;

pub trait Ops<T, N> {
    type Output;
    fn ops_on(&self, data: (T, N)) -> Self::Output;
}

impl<T, N, K> Ops<T, N> for MyAdd
where
    T: Add<N, Output = K>,
{
    type Output = K;
    fn ops_on(&self, data: (T, N)) -> Self::Output {
        data.0 + data.1
    }
}

#[derive(Clone)]
pub struct MyMap<I, F> {
    pub(crate) iter: I,
    f: F,
}

impl<I, F> MyMap<I, F> {
    fn vec(self) -> MyVec
    where
        Self: Iterator<Item = f32>,
    {
        MyVec(self.collect::<Vec<f32>>())
    }
}


impl<I, F, T, N> Iterator for MyMap<I, F>
where
    I: Iterator<Item = (T, N)>,
    F: Ops<T, N>,
{
    type Item = <F as Ops<T, N>>::Output;
    fn next(&mut self) -> Option<Self::Item> {
        if let Some(data) = self.iter.next() {
            Some(self.f.ops_on(data))
        } else {
            None
        }
    }
}

impl<'a> Add<&'a MyVec> for &'a MyVec {
    type Output = MyMap<Zip<Iter<'a, f32>, Iter<'a, f32>>, MyAdd>;
    fn add(self, rhs: &'a MyVec) -> Self::Output {
        let iter = self.0.iter().zip(rhs.0.iter());
        MyMap { iter, f: MyAdd }
    }
}

impl<'a, I, F> Add<&'a MyVec> for MyMap<I, F>
where
    Self: Iterator<Item = f32>,
{
    type Output = MyMap<Zip<Self, Iter<'a, f32>>, MyAdd>;
    fn add(self, rhs: &'a MyVec) -> Self::Output {
        MyMap {
            iter: self.zip(rhs.0.iter()),
            f: MyAdd,
        }
    }
}

impl<'a, I, F> Add<MyMap<I, F>> for &'a MyVec
where
    MyMap<I, F>: Iterator<Item = f32>,
{
    type Output = MyMap<Zip<MyMap<I, F>, Iter<'a, f32>>, MyAdd>;
    fn add(self, rhs: MyMap<I, F>) -> Self::Output {
        MyMap {
            iter: rhs.zip(self.0.iter()),
            f: MyAdd,
        }
    }
}


fn main() {
    let v = vec![1f32; 100_000_000];
    let my_vec = MyVec(v.clone());

    let time_start = std::time::Instant::now();
    let k1 = v.iter().zip(v.iter()).map(|(x, y)| x + y);
    let k2 = k1.zip(v.iter()).map(|(x, y)| x + y);
    let k3 = v.iter().zip(k2).map(|(x, y)| x + y);
    let res = k3.collect::<Vec<f32>>();
    println!("{:?}  {:?}", &res[..10], time_start.elapsed());

    let time_start = std::time::Instant::now();
    let k1 = &my_vec + &my_vec;
    let k2 = k1 + &my_vec;
    let k3 = &my_vec + k2;
    let res = k3.vec();
    println!("{:?}  {:?}", &res.0[..10], time_start.elapsed());

}

This is the playground.
I thought the reason maybe that Iterator implementing on MyMap by myself is slower than Iterator implementing on Vec<f32>.

You can check if implementing size_hint() and/or try_fold() and/or ExactSizeIterator makes a difference. Also, you are again duplicating a lot of functionality from those standard iterator types. The standard iterator types are extremely well-optimized (including the basic stuff I just mentioned), so you shouldn't expect being able to outperform them.

Instead, you should try to avoid stacking a lot of your own custom iterators on top of each other – that's why I suggested building a computational graph and applying it as a whole to the items of an iterator. The resulting loop will be much easier to optimize for the compiler, as it will only will have to optimize a single function with most or all mathematical operations inlined, likely with a single jump at the end of each iteration.

In contrast, if you deeply nest iterator adaptors, those potentially need to perform control flow, which is harder for the compiler to optimize away, because it can't always prove invariants about the more complicated control flow.

Accordingly, the computational graph approach is stochastically on par with std iterators – sometimes it is slower, sometimes it is faster. Playground.

1 Like

Thanks much for your solution and detailed explaination. I thought the trait is so powerful, I didn't expect there are several paths to the problem hours ago. Is there some sources about learning these intellectual design pattern, or it only comes from extensive practice?

It's not an existing "design pattern" (at least not one I know of). You introduce traits when you need to abstract over common behavior of different types.

1 Like