Writing a truly generic sum function (and why is it missing from the stdlib?)

Hello,

as an exercise, I am trying to write a generic sum function. Just like for can iterate over anything that implements IntoIterator, I would like my function to also accept anything that can be turned into an iterator.

I have the following, which works:

use std::ops::Add;

fn sum<T, I>(iterable: I) -> T
where
    I: IntoIterator<Item = T>,
    T: Add<Output = T> + Default
{
    let mut result = T::default();
    for elem in iterable {
        result = result + elem;
    }
    result
}

fn main() {
    let something = vec![0, 2, 1];
    println!("{}", sum(something));
}

But this doesn't work when the calling line is changed to

    println!("{}", sum(&something));

I managed to modify the sum function so that it works for a reference

fn sum_ref<'a, T, I>(iterable: I) -> T
where
    I: IntoIterator<Item = &'a T>,
    T: 'a + Add<Output = T> + Default + Copy,
{
    let mut result = T::default();
    for elem in iterable {
        result = result + *elem;
    }
    result
}

but now it no longer works for a non-reference...

Is there a way to have a single function which works in both cases?

Here is one way:

fn sum<T, I, S>(iterable: I) -> S
where
    I: IntoIterator<Item = T>,
    T: Add<T, Output = S>,
    S: Add<T, Output = S>,
    S: Default,
{
    let mut result = S::default();
    for elem in iterable {
        result = result + elem;
    }
    result
}
5 Likes

I suspect this is why the Iterator::sum() method introduces a helper trait (std::iter::Sum).

If you look at the implementations of std::iter::Sum, you'll see it has both impl Sum<f32> for f32 and impl<'a> Sum<&'a f32> for f32 so you can sum either an iterator of f32's or &f32's.

Here's a trivial implementation

fn main() {
    let values = [1_u32, 2, 3, 4];
    let total: u32 = sum(values);
    println!("sum({values:?}) = {total}");

    let references = [&1_u32, &2, &3, &4, &5, &6];
    let total: u32 = sum(references);
    println!("sum({references:?}) = {total}");
}

fn sum<T, I>(items: I) -> T
where
    I: IntoIterator,
    T: Sum<I::Item>,
{
    T::sum(items.into_iter())
}

Where my Sum trait is defined as...

trait Sum<A = Self> {
    fn sum<I>(iter: I) -> Self
    where
        I: Iterator<Item = A>;
}

impl Sum for u32 {
    fn sum<I>(iter: I) -> Self
    where
        I: Iterator<Item = u32>,
    {
        iter.fold(1, |a, b| a * b)
    }
}

impl<'a> Sum<&'a u32> for u32 {
    fn sum<I>(iter: I) -> Self
    where
        I: Iterator<Item = &'a u32>,
    {
        iter.fold(1, |a, b| a * b)
    }
}

(playground)

3 Likes

I see, thanks a lot. So the secret is to make the item type and the return type separate generic parameters. The line

    T: Add<T, Output = S>,

puts an additional restriction on the return type S. Without it, the return type could be both i32 and &i32 (for a container of 32 bit ints).

Here is a variant of the sum function that uses Iterator::sum:

use std::iter::Sum;

fn sum<T, I, S>(iterable: I) -> S
where
    I: IntoIterator<Item = T>,
    S: Add<T, Output = S> + Sum<T>,
    I::Item: Add<T, Output = S>,
{
    iterable.into_iter().sum()
}

It uses the same trick to uniquely define the return type. However, I wonder, isn't there a more elegant way to do this? For example, in the above function one can replace Add by Sub and things continue working. The trait Add is actually not used, it's only there to restrict the return type.

@grothesque You can use either Sum or Add + Default, both is redundant.

1 Like

Thanks. I see that the simplified

fn sum<I, S>(iterable: I) -> S
where
    I: IntoIterator,
    S: Sum<I::Item>,
    I::Item: Add<I::Item, Output = S>,
{
    iterable.into_iter().sum()
}

works as well, but removing the line

    I::Item: Add<I::Item, Output = S>,

makes the compiler complain:

error[E0283]: type annotations needed
  --> generic_sum/src/main.rs:71:20
   |
71 |     println!("{}", sum(&something));
   |                    ^^^ cannot infer type of the type parameter `S` declared on the function `sum`
   |
   = note: cannot satisfy `_: Sum<&i32>`
   = help: the following types implement trait `Sum<A>`:
             <isize as Sum>
             <isize as Sum<&'a isize>>
             <i8 as Sum>
             <i8 as Sum<&'a i8>>
             <i16 as Sum>
             <i16 as Sum<&'a i16>>
             <i32 as Sum>
             <i32 as Sum<&'a i32>>
           and 72 others

Actually, without it it can still only be i32 (unless you import some user-defined type that also implements addition with &i32). But the type deduction isn't smart enough to deduce i32, so you would have to explicitly specify the type at the call site.

The same thing happens for the sum method in std if you remove the :i32 from the example at Iterator in std::iter - Rust

This is actually what got me started into this: I tried to understand why one has to write
iter.sum::<i32>() to compute the sum of an iterator.

Are the reddit guys right that this is a case of overgeneralization?


I find it a bit cumbersome that one has to write

collection.iter().sum::<i32>()

instead of

sum(&collection)

(where, as we have just seen, the simple sum function can be provided without any runtime overhead.)

Is this just a case of the stdlib missing some non-essential functionality? Or is something gained by not providing a simple sum algorithm?

Default is not quite the right trait for this. It often happens to be the zero-element which is the identity-element for addition operations but it's not required by its API to be that.

So to this correctly one must return an Option to handle the zero-element case or define an AddIdentity or Zero provider.

6 Likes

The stdlib seems to have some notion of zero element, since an empty iterators sums to zero: e.g. (0..0).sum::<i32>() evaluates to zero.

Not in a generic fashion in the type system. It's implemented separately for each of the numeric types via a macro:

2 Likes

Indeed, I just found this out as well...

So, my conclusion from this exercise is that while a lot of deep insight is encoded in the design of Rust and its standard library, there are warts and imperfections as well, even in seemingly central areas.

The first post in this discussion was motivated by the question why IntoIterator is not used more in the standard library. After all, if it's easy to for-loop over something, where something: IntoIterator, it could be equally easy to sum it up or get the max value. Part of this is addressed by the itertools crate, but I don't see why the stdlib algorithms should be restricted to iterators, when they could take anything that implements IntoIterator.

Seems like the reason is historical, since Iterator predates IntoIterator. This seems to have caused a much uglier wart, namely the various problems with ranges, but hey, I guess that's life :-).

1 Like

I think in this case it was a deliberate choice at some point in history, not an accidental wart.
I recall there were discussions of having all those number-theoretic traits for multiplicative identity, additive identity and what-not but that it was deemed too complex for the standard library and delegated to the ecosystem instead. You can use the num_traits crate to get Zero and One.

4 Likes

I was not referring to the absence of number-theoretic traits in stdlib as a wart, but rather that operations on iterables (i.e. types implementing IntoIterator, e.g. collections) require the iterator to be requested explicitly, e.g. collection.into_iter().max(), while the same explicitness would be considered cumbersome in for-loops.
But perhaps this is a fair price for being able to method-chain iterator operations. (One would not want to add lots of methods like sum or max to IntoIterator in order not to encumber all the collections with them.) So the alternative would have been to provide these as functions under std::iter (just like there is std::iter::zip).

Not sure, is method-chaining worth this extra verbosity? Perhaps the fact that in Rust method calls are "priviledged" (they feature automatic dereferencing) plays a role?


Independently from the above, the fact that summing an iterator requires one to specify the output type explicitly, while this freedom is unused and unusable for standard types, feels like a wart as well.

I don't get what method chaining has to do with this. If all Iterator methods were duplicated on IntoIterator, then ther would be a lot of redundancy and a lot of potential for introducing inconsistency, and a lot of superfluous maintenance work. In general, you don't want to duplicate APIs, especially not huge ones like Iterator.

This is not a wart, this is API design. Seriously, if you think calling .iter() is too high a price to pay for that, I've got some bad news for you.

3 Likes

Thanks for chiming in.

As far as I can see from history, the IntoIterator trait was introduced so that one does not have to write the .iter() in

for elem in collection.iter():
    ...

Apparently, this was seen as ugly and unnecessarily verbose.

I was asking myself why the same is not seen as a problem with other uses of iterators, like finding the maximum value.

After all, methods like max or sum could have been implemented on IntoIterator instead of being implemented on Iterator. (Iterator implements IntoIterator, so these methods would be available for iterators as well.) That would not duplicate anything. It would, however, mean adding a lot of methods to anything that implements IntoIterator, like collections.

If adding all these methods is seen as a problem, one alternative approach would have been to use functions instead. There could exist a std::iter::max that, like std::iter::zip would accept a IntoIterator argument. This solution would have provided a simplification analogue to the one seen for for-loops, i.e. obtaining the iterator would be implicit.

I try to understand what one gains by requiring explicit calls to .iter(). In Python, for example, one hardly ever has to explicitly request iterators, and this approach is perfectly viable in Rust as well, as demonstrated by the for-loop, std::iter::zip, the itertools crate, and the above generic sum function.

My own synthesis is that perhaps one wanted to equip iterators with nice method-chaining, since method calls benefit from simplified syntax in Rust. It could also be, that the reason is historical:
Iterator was defined the way it is before IntoIterator came up.

You seem to imply that there are other reasons. I would gladly hear about them.

No. IntoIterator is not a mere syntactic helper. It exists to decouple iteration from collections. If this weren't the case, then collections would have to maintain iteration state, which would have severe consequences (e.g., you couldn't iterate over the same collection twice at the same time, either truly in parallel or by just holding two independent immutable iterators to the same collection), and it would badly violate SRP.

for … in is a very minor sugar that expands to something containing IntoIterator::into_iter(collection). This is possible because for loops have their own special language construct so it's known that it is always IntoIterator::into_iter() that the code should expand to. This is impossible to do (sanely) for arbitrary method calls.

After all, how is the compiler supposed to know that by writing .max(), you actually meant .into_iter().max()? The code itself doesn't have any information about this. You'd have to do full-blown type checking just to desugar the construct, and it would not only interact badly with type inference and method resolution (and Deref and…), but it would also be baffling to readers.

1 Like

Well, just like itertools::max which accepts any type that implements IntoIterator.

Imagine a stdlib where the Iterator trait would not have a max method, but there was instead a std::iter::max function that takes a IntoIterator argument and performs the same work that the current max method does.

There could be also a std::iter::sum function like shown above, etc.

What would be the problem with that, other that one would loose method chaining?

Consistency. All iterator methods are methods on Iterator. Also method chaining.