Writing a truly generic sum function (and why is it missing from the stdlib?)

grothesque · October 24, 2023, 10:46am

Hello,

as an exercise, I am trying to write a generic sum function. Just like for can iterate over anything that implements IntoIterator, I would like my function to also accept anything that can be turned into an iterator.

I have the following, which works:

use std::ops::Add;

fn sum<T, I>(iterable: I) -> T
where
    I: IntoIterator<Item = T>,
    T: Add<Output = T> + Default
{
    let mut result = T::default();
    for elem in iterable {
        result = result + elem;
    }
    result
}

fn main() {
    let something = vec![0, 2, 1];
    println!("{}", sum(something));
}

But this doesn't work when the calling line is changed to

    println!("{}", sum(&something));

I managed to modify the sum function so that it works for a reference

fn sum_ref<'a, T, I>(iterable: I) -> T
where
    I: IntoIterator<Item = &'a T>,
    T: 'a + Add<Output = T> + Default + Copy,
{
    let mut result = T::default();
    for elem in iterable {
        result = result + *elem;
    }
    result
}

but now it no longer works for a non-reference...

Is there a way to have a single function which works in both cases?

tczajka · October 24, 2023, 10:59am

Here is one way:

fn sum<T, I, S>(iterable: I) -> S
where
    I: IntoIterator<Item = T>,
    T: Add<T, Output = S>,
    S: Add<T, Output = S>,
    S: Default,
{
    let mut result = S::default();
    for elem in iterable {
        result = result + elem;
    }
    result
}

Michael-F-Bryan · October 24, 2023, 1:02pm

I suspect this is why the Iterator::sum() method introduces a helper trait (std::iter::Sum).

If you look at the implementations of std::iter::Sum, you'll see it has both impl Sum<f32> for f32 and impl<'a> Sum<&'a f32> for f32 so you can sum either an iterator of f32's or &f32's.

Here's a trivial implementation

fn main() {
    let values = [1_u32, 2, 3, 4];
    let total: u32 = sum(values);
    println!("sum({values:?}) = {total}");

    let references = [&1_u32, &2, &3, &4, &5, &6];
    let total: u32 = sum(references);
    println!("sum({references:?}) = {total}");
}

fn sum<T, I>(items: I) -> T
where
    I: IntoIterator,
    T: Sum<I::Item>,
{
    T::sum(items.into_iter())
}

Where my Sum trait is defined as...

trait Sum<A = Self> {
    fn sum<I>(iter: I) -> Self
    where
        I: Iterator<Item = A>;
}

impl Sum for u32 {
    fn sum<I>(iter: I) -> Self
    where
        I: Iterator<Item = u32>,
    {
        iter.fold(1, |a, b| a * b)
    }
}

impl<'a> Sum<&'a u32> for u32 {
    fn sum<I>(iter: I) -> Self
    where
        I: Iterator<Item = &'a u32>,
    {
        iter.fold(1, |a, b| a * b)
    }
}

(playground)

grothesque · October 24, 2023, 3:53pm

I see, thanks a lot. So the secret is to make the item type and the return type separate generic parameters. The line

    T: Add<T, Output = S>,

puts an additional restriction on the return type S. Without it, the return type could be both i32 and &i32 (for a container of 32 bit ints).

Here is a variant of the sum function that uses Iterator::sum:

use std::iter::Sum;

fn sum<T, I, S>(iterable: I) -> S
where
    I: IntoIterator<Item = T>,
    S: Add<T, Output = S> + Sum<T>,
    I::Item: Add<T, Output = S>,
{
    iterable.into_iter().sum()
}

It uses the same trick to uniquely define the return type. However, I wonder, isn't there a more elegant way to do this? For example, in the above function one can replace Add by Sub and things continue working. The trait Add is actually not used, it's only there to restrict the return type.

jendrikw · October 24, 2023, 3:57pm

@grothesque You can use either Sum or Add + Default, both is redundant.

grothesque · October 24, 2023, 4:06pm

Thanks. I see that the simplified

fn sum<I, S>(iterable: I) -> S
where
    I: IntoIterator,
    S: Sum<I::Item>,
    I::Item: Add<I::Item, Output = S>,
{
    iterable.into_iter().sum()
}

works as well, but removing the line

    I::Item: Add<I::Item, Output = S>,

makes the compiler complain:

error[E0283]: type annotations needed
  --> generic_sum/src/main.rs:71:20
   |
71 |     println!("{}", sum(&something));
   |                    ^^^ cannot infer type of the type parameter `S` declared on the function `sum`
   |
   = note: cannot satisfy `_: Sum<&i32>`
   = help: the following types implement trait `Sum<A>`:
             <isize as Sum>
             <isize as Sum<&'a isize>>
             <i8 as Sum>
             <i8 as Sum<&'a i8>>
             <i16 as Sum>
             <i16 as Sum<&'a i16>>
             <i32 as Sum>
             <i32 as Sum<&'a i32>>
           and 72 others

tczajka · October 24, 2023, 4:08pm

Actually, without it it can still only be i32 (unless you import some user-defined type that also implements addition with &i32). But the type deduction isn't smart enough to deduce i32, so you would have to explicitly specify the type at the call site.

jendrikw · October 24, 2023, 4:10pm

The same thing happens for the sum method in std if you remove the :i32 from the example at Iterator in std::iter - Rust

grothesque · October 24, 2023, 4:19pm

This is actually what got me started into this: I tried to understand why one has to write
iter.sum::<i32>() to compute the sum of an iterator.

Are the reddit guys right that this is a case of overgeneralization?

I find it a bit cumbersome that one has to write

collection.iter().sum::<i32>()

instead of

sum(&collection)

(where, as we have just seen, the simple sum function can be provided without any runtime overhead.)

Is this just a case of the stdlib missing some non-essential functionality? Or is something gained by not providing a simple sum algorithm?

the8472 · October 24, 2023, 5:23pm

Default is not quite the right trait for this. It often happens to be the zero-element which is the identity-element for addition operations but it's not required by its API to be that.

So to this correctly one must return an Option to handle the zero-element case or define an AddIdentity or Zero provider.

grothesque · October 24, 2023, 7:34pm

The stdlib seems to have some notion of zero element, since an empty iterators sums to zero: e.g. (0..0).sum::<i32>() evaluates to zero.

the8472 · October 24, 2023, 7:49pm

Not in a generic fashion in the type system. It's implemented separately for each of the numeric types via a macro:

github.com

rust-lang/rust/blob/98b4a64a164b2861f5d84aba3140761068f52231/library/core/src/iter/traits/accum.rs#L92-L97


      
          integer_sum_product!(@impls 0, 1,
                  #[stable(feature = "iter_arith_traits", since = "1.12.0")],
                  $($a)*);
          integer_sum_product!(@impls Wrapping(0), Wrapping(1),
                  #[stable(feature = "wrapping_iter_arith", since = "1.14.0")],
                  $(Wrapping<$a>)*);

grothesque · October 24, 2023, 8:13pm

Indeed, I just found this out as well...

So, my conclusion from this exercise is that while a lot of deep insight is encoded in the design of Rust and its standard library, there are warts and imperfections as well, even in seemingly central areas.

The first post in this discussion was motivated by the question why IntoIterator is not used more in the standard library. After all, if it's easy to for-loop over something, where something: IntoIterator, it could be equally easy to sum it up or get the max value. Part of this is addressed by the itertools crate, but I don't see why the stdlib algorithms should be restricted to iterators, when they could take anything that implements IntoIterator.

Seems like the reason is historical, since Iterator predates IntoIterator. This seems to have caused a much uglier wart, namely the various problems with ranges, but hey, I guess that's life :-).

the8472 · October 24, 2023, 8:28pm

I think in this case it was a deliberate choice at some point in history, not an accidental wart.
I recall there were discussions of having all those number-theoretic traits for multiplicative identity, additive identity and what-not but that it was deemed too complex for the standard library and delegated to the ecosystem instead. You can use the num_traits crate to get Zero and One.

grothesque · October 25, 2023, 10:02am

I was not referring to the absence of number-theoretic traits in stdlib as a wart, but rather that operations on iterables (i.e. types implementing IntoIterator, e.g. collections) require the iterator to be requested explicitly, e.g. collection.into_iter().max(), while the same explicitness would be considered cumbersome in for-loops.
But perhaps this is a fair price for being able to method-chain iterator operations. (One would not want to add lots of methods like sum or max to IntoIterator in order not to encumber all the collections with them.) So the alternative would have been to provide these as functions under std::iter (just like there is std::iter::zip).

Not sure, is method-chaining worth this extra verbosity? Perhaps the fact that in Rust method calls are "priviledged" (they feature automatic dereferencing) plays a role?

Independently from the above, the fact that summing an iterator requires one to specify the output type explicitly, while this freedom is unused and unusable for standard types, feels like a wart as well.

H2CO3 · October 25, 2023, 11:18am

I don't get what method chaining has to do with this. If all Iterator methods were duplicated on IntoIterator, then ther would be a lot of redundancy and a lot of potential for introducing inconsistency, and a lot of superfluous maintenance work. In general, you don't want to duplicate APIs, especially not huge ones like Iterator.

This is not a wart, this is API design. Seriously, if you think calling .iter() is too high a price to pay for that, I've got some bad news for you.

grothesque · October 25, 2023, 12:29pm

Thanks for chiming in.

As far as I can see from history, the IntoIterator trait was introduced so that one does not have to write the .iter() in

for elem in collection.iter():
    ...

Apparently, this was seen as ugly and unnecessarily verbose.

I was asking myself why the same is not seen as a problem with other uses of iterators, like finding the maximum value.

After all, methods like max or sum could have been implemented on IntoIterator instead of being implemented on Iterator. (Iterator implements IntoIterator, so these methods would be available for iterators as well.) That would not duplicate anything. It would, however, mean adding a lot of methods to anything that implements IntoIterator, like collections.

If adding all these methods is seen as a problem, one alternative approach would have been to use functions instead. There could exist a std::iter::max that, like std::iter::zip would accept a IntoIterator argument. This solution would have provided a simplification analogue to the one seen for for-loops, i.e. obtaining the iterator would be implicit.

I try to understand what one gains by requiring explicit calls to .iter(). In Python, for example, one hardly ever has to explicitly request iterators, and this approach is perfectly viable in Rust as well, as demonstrated by the for-loop, std::iter::zip, the itertools crate, and the above generic sum function.

My own synthesis is that perhaps one wanted to equip iterators with nice method-chaining, since method calls benefit from simplified syntax in Rust. It could also be, that the reason is historical:
Iterator was defined the way it is before IntoIterator came up.

You seem to imply that there are other reasons. I would gladly hear about them.

H2CO3 · October 25, 2023, 12:37pm

No. IntoIterator is not a mere syntactic helper. It exists to decouple iteration from collections. If this weren't the case, then collections would have to maintain iteration state, which would have severe consequences (e.g., you couldn't iterate over the same collection twice at the same time, either truly in parallel or by just holding two independent immutable iterators to the same collection), and it would badly violate SRP.

for … in is a very minor sugar that expands to something containing IntoIterator::into_iter(collection). This is possible because for loops have their own special language construct so it's known that it is always IntoIterator::into_iter() that the code should expand to. This is impossible to do (sanely) for arbitrary method calls.

After all, how is the compiler supposed to know that by writing .max(), you actually meant .into_iter().max()? The code itself doesn't have any information about this. You'd have to do full-blown type checking just to desugar the construct, and it would not only interact badly with type inference and method resolution (and Deref and…), but it would also be baffling to readers.

grothesque · October 25, 2023, 12:44pm

Well, just like itertools::max which accepts any type that implements IntoIterator.

Imagine a stdlib where the Iterator trait would not have a max method, but there was instead a std::iter::max function that takes a IntoIterator argument and performs the same work that the current max method does.

There could be also a std::iter::sum function like shown above, etc.

What would be the problem with that, other that one would loose method chaining?

jendrikw · October 25, 2023, 12:48pm

Consistency. All iterator methods are methods on Iterator. Also method chaining.

Topic		Replies	Views
Using Iterator::sum in an expression	8	13004	September 7, 2016
Why does the `iterator::Sum` exist (Compare to just call `Add::add` repetitively)?	6	1046	July 15, 2020
Implementing the Sum Trait help	5	7262	December 18, 2018
Generic Sum Impl	7	1980	August 22, 2017
Why does `Option<T>` not implement the `Add` trait? help	8	557	September 28, 2024

Writing a truly generic sum function (and why is it missing from the stdlib?)

Related topics