In a where clause like `where T: Iterator<Item=char> `... what is the `<Item=char>` after `Iterator` called?`

I understand that <Item=char> gives the compiler more information regarding the type of objects that the Iterator should have. So, it is providing extra trait information. How does the <...> notation after a trait generalize? Where can I read some documentation/notes on it? (hence, asking for its name, so that I can google it)

2 Likes

I think this is all under the "generic bounds" or "trait bounds" umbrella. This example has a trait with an associated type that you’re putting a bound on, so it’s a bound on the associated type of a trait. You can put similar bounds on traits with generic parameters as well.

Note also that you can write the same bound without a where clause: fn foo<I: Iterator<Item=char>>(...).

Unfortunately I don’t have an exact phrase to google for. But, I can try to answer some questions about it if you have any.

3 Likes

Thank you! I think that's good enough for now.

I'm not sure what they're called either, but when I need a term I tend to use the phrase equality bounds. One could perhaps be more specific and say "equality bounds on associated types..." but currently those are the only types of equality bounds that exist.

Some various notes and tricks related to them:

1. They're required for trait objects (a.k.a. dynamic dispatch). While I can only theorize (because I am too lazy to go on an archaeology hunt!), I imagine this is the reason why it was deemed so important to support these constraints on associated types, even if they are not supported anywhere else.

2. They can be used in impls to "name" intermediate types.

Consider the following function body

|a, b| { a + b + b + b }

Let's say that for some dumb reason we needed a function with the most general signature possible for this implementation. What would that look like? Well, I guess something like this:


fn add_b_three_times<A, B>(a: A, b: B)
    -> <<<A as Add<B>>::Output as Add<B>>::Output as Add<B>>::Output
where
    B: Copy,
    A: Add<B>,
    <A as Add<B>>::Output: Add<B>,
    <<A as Add<B>>::Output as Add<B>>::Output: Add<B>,
{ a + b + b + b }

But woah nelly is that ugly! I sure wouldn't wanna deal with that!

However, if you were to put this function into a trait, then you can add as many extra type parameters as you want to your impl to represent all the Output types, constraining them via equality:

trait Add3Times<B> {
    type Output;
    fn add_three_times(self, b: B) -> Self::Output;
}

impl<A, B, AB, ABB, ABBB> Add3Times<B> for A
where
    B: Copy,
    A: Add<B, Output=AB>,
    AB: Add<B, Output=ABB>,
    ABB: Add<B, Output=ABBB>,
{
    type Output = ABBB;
    
    fn add_three_times(self, b: B) -> Self::Output
    { self + b + b + b }
}

as you can see, not only has our awfully useless function nearly doubled in code size for no apparent increase in utility, but if you squint really hard, you might find that it is also now possibly a smidge cleaner.

2 Likes

Can you explain this example a little more? It is really interesting to me. I just cannot follow the abstraction/generalization all the way. Thank you!

With pleasure! Although for full disclosure, I was mostly joking. (Only mostly, because I know I have done this at least once in my own code.)


The problem

So, I guess I should first give a real-ish example of where stuff like this might show up. It might seem weird for me to be talking about "the type of a + b + b + b," as one think, isn't it just the type of a? The answer to this is, no, not always.

Lately, I've been trying to adopt dimensioned into my codebase so that I can have typechecked units. dimensioned makes for a very good example of types where a, a * b, a * b * b and a * b * b * b could all potentially be different types. So let me recast my example in terms of multiplication.

First, we'll start out with a non-generic function:

extern crate dimensioned as dim;  // Cargo.toml:  dimensioned = "0.6"

use ::dim::si::{Unitless, Meter, Meter2, Meter3};

fn mul_three_times(a: Unitless<f64>, b: Meter<f64>) -> Meter3<f64>
{
    // (In the end this is still just `a * b * b * b`, but I've expanded
    //   it out with type annotations to show how, indeed, all of the
    //   intermediate quantities are actually different types)
    let a:    Unitless<f64> = a;
    let ab:   Meter<f64>    = a * b;
    let abb:  Meter2<f64>   = ab * b;
    let abbb: Meter3<f64>   = abb * b;
    abbb
}

When you know in advance exactly what units the function will be used on, like in the above, writing the function is easy. The problem is when you want to be able to reuse this code on values with different units, say, ::cgs::Centimeter and cgs::Gram. For this to work, you have to write your function as being completely generic over A and B:

fn mul_three_times<A, B>(a: A, b: B) -> ?????
where ???????
{ a * b * b * b }

How can we fill in the ????? ? Usually, when you want to talk about the output type of the expression a * b in a totally generic context, you need to refer to it using associated type syntax, which in this case looks like <Lhs as Mul<Rhs>>::Output. Thus, these are what the general types of the intermediate expressions look like:

fn mul_three_times(a: A, b: B) -> ????
where ??????
{
    let a    = a;       // type is    A
    let ab   = a * b;   // type is   <A as Mul<B>>::Output
    let abb  = ab * b;  // type is  <<A as Mul<B>>::Output as Mul<B>>::Output
    let abbb = abb * b; // type is <<<A as Mul<B>>::Output as Mul<B>>::Output as Mul<B>>::Output
    abbb
}

If you fill in the output type and where bounds using those types indicated in the comments, you'll end up with something that looks just like the add_b_three_times function from my previous post.

fn mul_three_times<A, B>(a: A, b: B)
    -> <<<A as Mul<B>>::Output as Mul<B>>::Output as Mul<B>>::Output
where
    B: Copy,
    A: Mul<B>,
    <A as Mul<B>>::Output: Mul<B>,
    <<A as Mul<B>>::Output as Mul<B>>::Output: Mul<B>,
{ a * b * b * b }

But as I said before, that signature really sucks. How can anybody possibly read it?! Something must be done about this.

What I think you really should do here

Before we continue...

This signature is simple enough that you can get away with simply using typedefs to simplify it. There's a useful tool provided in typenum (and you can just as easily define it yourself):

type Prod<A, B> = <A as Mul<B>>::Output;

Basically, this just gives us a shorthand for writing all those ugly associated types.

extern crate dimensioned as dim;

use ::std::ops::Mul;
use ::dim::typenum::Prod;

fn mul_three_times<A, B>(a: A, b: B) -> Prod<Prod<Prod<A, B>, B>, B>
where
    B: Copy,
    A: Mul<B>,
    Prod<A, B>: Mul<B>,
    Prod<Prod<A, B>, B>: Mul<B>,
{ a * b * b * b }

Looks good to me. But for cases where this isn't enough, we can try using the trick I was discussing with equality bounds.

Naming types by adding equality bounds...

(a) ...to the function (don't do this)

Just to make the transformation more obvious, I'm going to perform the trick directly on the function (without introducing a trait). You shouldn't do this, for reasons I will soon explain.

As a reminder, the full signature:

fn mul_three_times<A, B>(a: A, b: B)
    -> <<<A as Mul<B>>::Output as Mul<B>>::Output as Mul<B>>::Output
where
    B: Copy,
    A: Mul<B>,
    <A as Mul<B>>::Output: Mul<B>,
    <<A as Mul<B>>::Output as Mul<B>>::Output: Mul<B>,
{ a * b * b * b }

As our first step towards simplification, let's introduce a type parameter AB, and use an equality constraint to set <A as Mul<B>>::Output equal to this type.

fn mul_three_times<A, B, AB>(a: A, b: B)
    -> <<AB as Mul<B>>::Output as Mul<B>>::Output
where
    B: Copy,
    A: Mul<B, Output=AB>,
    AB: Mul<B>,
    <AB as Mul<B>>::Output: Mul<B>,
{ a * b * b * b }

What we accomplished by doing this is that we introduced a name for the type of a * b. We can do this again to introduce a name for the type of a * b * b:

fn mul_three_times<A, B, AB, ABB>(a: A, b: B)
    -> <ABB as Mul<B>>::Output
where
    B: Copy,
    A: Mul<B, Output=AB>,
    AB: Mul<B, Output=ABB>,
    ABB: Mul<B>,
{ a * b * b * b }

and one last time to name the type of a * b * b * b

fn mul_three_times<A, B, AB, ABB, ABBB>(a: A, b: B) -> ABBB
where
    B: Copy,
    A: Mul<B, Output=AB>,
    AB: Mul<B, Output=ABB>,
    ABB: Mul<B, Output=ABBB>,
{ a * b * b * b }

Neat!

Just one problem: By adding these type parameters to the function, they're now part of the signature. If you find yourself some day needing to specify the type parameters for some reason, you'll no longer be able to write mul_three_times::<A, B>(a, b), and instead you'll be forced to write mul_three_times::<A, B, _, _, _>(a, b). This is why you should introduce a trait, because the extra type parameters will then only show up in the impl where they can't affect signatures.

(b) ...to a trait impl.

So, this is basically, what I showed before. Although usually, if I originally intended for the function to be a free function, then I would also write a free function wrapper for it, as I have done here, and label the trait as "public implementation detail".

/// Implementation detail of `mul_three_times()`.
///
/// Please pretend this doesn't exist.
pub trait MulThreeTimes<B> {
    type Output;

    fn mul_three_times(self, b: B) -> Self::Output;
}

impl<A, B, AB, ABB, ABBB> MulThreeTimes<B> for A
where
    B: Copy,
    A: Mul<B, Output=AB>,
    AB: Mul<B, Output=ABB>,
    ABB: Mul<B, Output=ABBB>,
{
    type Output = ABBB;
    
    fn mul_three_times(self, b: B) -> Self::Output
    { self * b * b * b }
}

// Free function wrapper

/// Multiply `a` by `b` three times.
///
/// The output type is whatever the type of `a * b * b * b` is.
pub fn mul_three_times<A, B>(a: A, b: B) -> <A as MulThreeTimes<B>>::Output
where A: MulThreeTimes<B>
{ a.mul_three_times(b) }

Notice how, by doing this, we have not really made the signature nicer to consumers (the output type is "this weird thing we made up just now"), and have in fact obscured the output type somewhat by moving it into a trait impl. The only conceivable benefit really is that we, as a maintainer of the code, can more easily read and modify the impl.

:confetti_ball: Hooray?

So... when should I do this?

Uhhhhh... considering all of the above, if it's going to impact your public API then probably never. (At the very least, you ought to have other motivation for creating a trait first; in that case, you get this power for free). It's alright for internal functions and utilities, which is where I've used it.

It's really just a dumb trick to keep in your toolbox because, well, you never know.


Dammit, I need a blog.

4 Likes

I agree. I love these posts :heart:

1 Like

Oh! Oh! Hi, I think I can contribute to this. :grinning:

For context: a few months ago, I also used Dimensioned in a home assignment. The assignment was to write a really simple Monte-Carlo simulation of gamma radiation being absorbed by a block of lead. There was also a part where we had to take statistics of the simulation results and calculate means, standard deviations, etc.

As is reasonable for run-once-touch-never-again code, I wanted it to be as generic as humanly possible.

So the module that calculated mean and standard deviation was generic over the variable that it calcualted these values for. And because the standard deviation is the square root of a sum of squares divided by an integer, I quickly ran into the same trouble as you, @ExpHP, where my struct Statistics grew some horrendously long trait bounds.

I experimented a lot with how to improve these bounds and eventually settled for your trait impl approach (b). I created a not-too-complicated hierarchy of traits with blanket impls for all of them. However, contrary to your experience, I found that the result actually looked rather pleasant.

For reference, these are my traits:

/// A trait alias that specifies all bounds required to store a
/// variable in a `Statistics` variable.
pub trait Primitive: Copy + Default + Debug {}

/// The trait of all types that can be accumulated.
pub trait Cumulable
where
    Self: Sized + AddAssign + Sub<Output = Self> + Div<f64, Output = Self>
{
}

/// Trait of all requirements for a type to be fed to `Statistics`.
pub trait Stat: Primitive + Cumulable {
    type Variance: Primitive + Cumulable + Sqrt<Output = Self::StdDev>;
    type StdDev;

    /// Connects `Self::Variance` with `Self`.
    fn mul(d1: Self, d2: Self) -> Self::Variance;

    /// Connects `Self::StdDev` with `Self::Variance`.
    fn sqrt(v: Self::Variance) -> Self::StdDev;
}

And then, I was able to use them like this:

#[derive(Clone, Debug, Default)]
pub struct Statistics<X: Stat> {
    count: u32,
    mean: X,
    sum_of_squares: X::Variance,
}

impl<X: Stat> Statistics<X> {
    pub fn mean(&self) -> X { self.mean }

    pub fn variance(&self) -> Option<X::Variance> { /* ... */ }

    pub fn standard_deviation(&self) -> Option<X::StdDev> {
        self.variance().map(X::sqrt)
    }

}

impl<X> Display for Statistics<X>
where
    X: Stat + Display,
    X::Variance: Display,
    X::StdDev: Display,
{
    // ...
}

I think the trait impl method can work, but you have to think really hard about what the quantities you're using mean. For example, what is the meaning of b*b*b? Is it a volume? If yes, your code might benefit from a pub trait Volume<L>. This way, consumers of your API can look at the signature and immediately have a rough idea of what they're supposed to pass for a and b.

Final disclaimer: This method worked really well for this one Statistics type. When I wrote another module, however, I had to start from scratch and define a new trait Primitive, and a bunch of traits based on that. It's really easy to end up with a bunch of duplicated traits. Maybe with enough time, I would've been able to put all traits into one common module, but even then, it's a lot of what people might consider boilerplate.

1 Like