Design patterns for composability with traits (i.e. typeclasses)?

Some discussion in another thread anticipated a thread I was planning to create:

I want to ask for ideas on the best patterns for accomplishing each of the following:

  1. My function inputs a collection with elements of a known data type and I want to call a function that takes a collection with generic elements of type bound by some trait so that second function can call methods of that trait on these generic elements. I don't want to rebuild the collection to convert data types.

  2. My function inputs a collection with elements of a known data type and I want to call a function that takes a collection with generic elements of type bound by some conjunction of traits so that second function can call methods of any of these traits on these generic elements. I don't want to rebuild the collection to convert data types.

  3. My function inputs a collection with elements of a generic data type and I want to call a function that takes a collection with generic elements of type bound by some trait so that second function can call methods of that trait on these generic elements. I don't want to rebuild the collection to convert data types.

  4. My function inputs a collection with elements of a generic data type and I want to call a function that takes a collection with generic elements of type bound by some conjunction of traits so that second function can call methods of any of these traits on these generic elements. I don't want to rebuild the collection to convert data types.

  5. My function inputs a collection with elements of a conjunction of known data types and I want to call a function that takes a collection with generic elements of type bound by some trait so that second function can call methods of that trait on these generic elements. I don't want to rebuild the collection to convert data types.

  6. My function inputs a collection with elements of a conjunction of known data types and I want to call a function that takes a collection with generic elements of type bound by some conjunction of traits so that second function can call methods of any of these traits on these generic elements. I don't want to rebuild the collection to convert data types.

  7. My function inputs a collection with elements of a generic data type that can be a conjunction of data types and I want to call a function that takes a collection with generic elements of type bound by some conjunction of traits so that second function can call methods of any of these traits on these generic elements. I don't want to rebuild the collection to convert data types.

Please feel free to ask for clarifications. Please feel free to offer ideas and example code on any of the numbered items.

It would be helpful if you could factor out the repeated parts of each statement, I can't really read it like that.

Good point. It is burdensome.

The following relate to a function which inputs a collection of a certain element type E1 which wants to call a function which inputs a collection of a certain element type E2 where the second function can apply all the methods of E2 on the elements of the collection. In no case should the solution require rebuilding the collection to convert types.

Following have E1 = specific data type:

  1. E2 = generic type bounded by a single trait
  2. E2 = generic type bounded by a conjunction of traits

Following have E1 = unbounded generic data type:

  1. E2 = generic type bounded by a single trait
  2. E2 = generic type bounded by a conjunction of traits

Following have E1 = disjunction of specific data types:

  1. E2 = generic type bounded by a single trait
  2. E2 = generic type bounded by a conjunction of traits

Following have E1 = unbounded generic type that can be a disjunction of specific data types:

  1. E2 = generic type bounded by a single trait
  2. E2 = generic type bounded by a conjunction of traits

Any ideas are appreciated. I want to learn more about what is possible and where the pain points are.

I doubt I have asked the question in the best way. I need to learn more about what is possible so I can learn how to better ask the question. if you think my requirements are better structured a different way, please suggest.

Well the first design pattern I can think of comes from Haskell, and that is don't put constraints on types, as it results in requiring that trait for every function you pass that type to, and does not provide that trait automatically.

The rest of this comes from Stepanov's "Elements of Programming". Note the book does not explain the design patterns, but rather like a mathematics text it provides the story of a developing sequence of algorithms, the design patterns become apparent to the reader by their own deductions from the code presented, the book does not present the answers, rather the reader has to work, but the benefit of this is that a much greater level of understanding can be aquired.

The general rule is to require the traits on the function that needs them, so a generic in-place sort function over any collection (using Stepanov's iterators) would have a type signature:

fn sort<I, R>(mut f : I, n : I::DistanceType, mut r : R)
where I : ForwardIterator + Mutable, R FnMut(&I::ValueType, &I::ValueType) -> bool

This function will process any collection that provides an iterator that implements ForwardIterator and Readable, however all the elements in the collection must be the same type (which makes sense for sort, because the elements must form a weak-ordering).

Considering the elements to be generic too you might want to do something like calculate the average area of a collection of shapes:

fn average_n<I, Proj, J>(mut f : I, mut n : I::DistanceType, proj : Proj) -> J
where I : Iterator + Readable, Proj : FnMut(&I::ValueType) -> J, J : Integer {
    let mut sum = J::zero();
    let mut count = J::zero();
    while n != I::DistanceType::zero() {
        sum += proj(f.source());
        f = f.successor();
        n = n.predecessor();
        count = count.successor();
    }
    sum / count;
}

The thing to note here is that the projection function 'proj' is applied to each member before summing, so this is generic over any container which can contain a heterogeneous collection of any object types, we don't even need to reference the trait we expect these objects to implement. This gets provided in the projection function:

fn project_area(x: &Box<&HasArea>) -> u32 {
    x.get_area()
}

So here is the projection function for a collection of objects that all implement the HasArea trait which requires implementing the get_area function:

trait HasArea {
    fn get_area(&self) -> u32
}

If you wanted to combine multiple traits, then you would do something like:

trait Shape : HasArea + OtherTraits {}
fn project_shape_area(x: &Box<&Shape>) -> u32 {
    x.get_area()
}

fn average_shape_area_n<I>(f : I, n : I::DistanceType) -> u32
where I : Iterator + Readable {
    average_n(f, n, project_shape_area);
}

The point to note is that the "average_n" algorithm is generic over all collection types (that provide an iterator that implements ForwardIterator) and over object types that implement multiple traits which could be a conjunction of traits, or traits that require other traits that are conjunctions of traits etc.

Afaics, multiple passes of Enumerable:fold can also bubble sort and I suppose other sort algorithms. But that is not particularly germane to issue of this thread.

I can't visualize unambiguously what you mean. Perhaps a code example or explaining what in the current example you are referring to specifically?

Oh so you mean for example the type of R? But I propose a Comparable trait instead below.

So if the ValueType is a tagged union enum, does Rust know how to dynamic dispatch to the trait implementation for each possible data type in union? Edit: I see the answer is yes.

I assume yes, but afaics the problem remains that the enum is not first-class (so not extensible), so I can't change the ValueType when adding new values to the collection (unless I rebuild a copy of the collection on every add which is intolerable).

ValueType is a type parameter on the parameterized type I so afaics it is a HKT?

I'd prefer to see a Comparable trait here instead, but I don't know to specify that the associated type ValueType is bounded on the trait Comparable in Rust.

Afaics, the problem again is this is nominal and not first-class composability, thus afaics requires additional boilerplate.

I can't visualize unambiguously what you mean. Perhaps a code example or explaining what in the current example you are referring to specifically?

Lets leave that then to avoid confusing the issue. I haven't put any constraints on the types themselves. That would look like this:

struct T<X> where X : Readable {val : X}

But I think this is an anti-pattern, as it requires you to supply the Readable trait everywhere the type is passed, even if that function does not use the functions in the Readable trait.

So if the ValueType is a tagged union enum, does Rust know how to dynamic dispatch to the trait implementation for each possible data type in union? Edit: I see the answer is yes.

No the value type is a runtime polymorphic trait-object. Edit: This is not quite right, neither algorithm specify what the ValueType is, they are generic over any ValueType, however in the average example I go on to specify a projection function that when used as a parameter to the average_n function requires ValueType to be a Boxed trait-object and the result type for the average to be i32.

I assume yes, but afaics the problem remains that the enum is not first-class (so not extensible), so I can't change the ValueType when adding new values to the collection (unless I rebuild a copy of the collection on every add which is intolerable).

This is wrong, because its a runtime polymorphic trait object you can add a new object type without recompiling existing code, and you can add any types that implement the trait without copying anything.

ValueType is a type parameter on the parameterized type I so afaics it is a HKT?

No it is an associated type of the Readable trait.

I'd prefer to see a Comparable trait here instead, but I don't know to specify that the associated type ValueType is bounded on the trait Comparable in Rust.

The problem with a comparable trait is that there can only be one order for each type. Passing a relation function is more generic (remember the point of generic programming is to write the most generic algorithm with no loss of efficiency).

Afaics, the problem again is this is nominal and not first-class composability, thus afaics requires additional boilerplate

I haven't seen any evidence of this additional boilerplate. Look at that beautiful average algorithm, every letter, every operation is optimal (this is more a statement if intent than of fact, as I believe any algorithm can be improved). If you can improve it by even one machine instruction without losing generality then I would value seeing the code, and remember it will operate on any collection that provides the ForwardIterator, even ones not written yet, and can operate over runtime polymorphic collections and average any projection function of the objects (so extract any property of the objects), even types not yet created as long as they implement the required trait, which is not specified in the average algorithm itself, so the algorithm will work over unknown future collections with unknown future trait bounds.

While eating I realized the answer is "No". A constructed instance type of I does not appear, rather only the implementation of the traits Iterator + Readable. The I::ValueType is the type parameter on Iterator not on the type of I that implements the trait Iterator (or per @keean's reply trait Readable).

Well you can't see that from the limited source code, but ValueType is the associated type of Readable (which provides the dereference operator for read-only access to a contained value) and DistanceType is the associated type of Iterator (which provides the successor function) and is a type for measuring distances in the space the iterator is in (which may not be a simple linear space).

Oh yes very much agreed.

I meant it can be (called with a ValueType that is an enum) not to imply that it is always. The context was referring to the bolded word "heterogeneous" in what you had claimed was possible. How else would you implement a heterogeneous collection?

A collection of trait-objects is a heterogeneous collection as well, but yes it could be an enum :slight_smile: A homogeneous collection is one like a 'C' array where every element has to be of the same type. A heterogeneous one is where each element can be of a different type (but it is fine for this to be a restricted set of types, such as those implementing a certain trait) as it only need be more than one type to be heterogeneous. For example by providing the following projection function from above:

fn project_area(x: &Box<&HasArea>) -> u32 {
    x.get_area()
}

average_n can be used with this function to average any heterogeneous collection (array, list, whatever) providing the elements in the collection implement "HasArea". The types of the elements of the collection are not statically known at compile time, they could be derived from keyboard input at runtime, all we know about them statically is that they "HaveArea".

This gets to the crux of the problem I am trying to point out.

Afaics a collection of trait objects is not composable, because it doesn't know how to interact with a function that needs different set of trait interfaces on the underlying data types. Once we erase the data types at compile-time, we can't ever get them back. That is why the design I will propose, is about keeping the data types around and only asking for trait interfaces in the signatures of functions.

As I had alluded in the long-winded other thread, I see that trait objects are the wrong construct.

I know the definition of heterogeneous of course.

Not fine. As explained above.

What makes trait objects safe is precisely that you cannot recover the runtime type information. Reflection (the reading of type metadata at runtime) is a whole other thing.

If you are concerned about statically proving safety at compile time, you can only use information that is available at compile time.

I am obviously not arguing that trait objects are unsafe, rather they limit composability because they erase the data types and so the compiler can no longer compose new traits on those data types. In the parlance of Wadler's Expression Problem, trait objects are not open to extension in the direction of new functions (i.e. new traits). They are only open to extension in the direction of new data types (i.e. implementations of the traits of the trait object). My proposal will be afaics (if I am not mistaken) the first complete solution to the Expression Problem afaik. This has been a goal of mine on and off (sporadically) for the past few years.

Afaics, you are conflating orthogonal issues. I will have to explain holistically when I explain more on first-class intersections and unions, since I've already tried to explain it at that other thread but apparently my explanation is not understood.

I am not referring to the RTTI not checked by the compiler. In that discussion at the linked thread, the data types Natural /\ Positive were not erased and the hypothetical compiler is checking them. Perhaps I have some mistake on that, but I don't see it and I explained already in that other thread.

Please instead in the interim address the inability to express composability with trait objects:

A collection of trait objects is extensible because it can deal with datatypes that are not yet known (a new type that wants to join the collection has to implement all the required trait functions).

This allows functions to be written in a true generic way, because when the function requires an Iterator, I know I can call the 'successor' function.

What is the alternative? If it doesn't provide successor, how can I iterate over it?

Lets look at an enum as an alternative, here we cannot extend with a new datatype, but we can match on each type in the enum and do something different with it. It is a good thing that we cannot extend, lets look at some code:

match thing {
    Type1(a) => ..
    Type2(a) => ..
}

If the type of "thing" is extended to include a new type, the above becomes unsafe, as it has no branch to handle the new type. When you add a new type all functions that match on the type needs rewriting.

That is only one of the two dimensions in the Expression Problem. I edited my prior post, please re-read.

Afaics you are conflating orthogonal issues. A function can ask for the traits it wants, but the collection can retain knowledge of the first-class union of data types it is populated with. Thus the Expression Problem is open in both dimensions. Afaics, the only downside is we will need to move the dictionaries ("vtable") to a global hashtable.

Ah afaics (if I am not mistaken) I have a solution to that with first-class unions and a global hashtable for the dynamic dispatch. All statically type checked.

I literally wrote "I can't change the ValueType" and you write "wrong" but then you provide no argument about changing the ValueType. The type of the trait objects don't change. Yes I can create new data types and implement them on the ValueType which is a trait object, but I can't change the ValueType of the collection without building a new copy of the collection.

You are right, the ValueType must be a single type, but that type can be an existential container like Box<&Shape> which can contain any shape (or an enum).

Now this is the really interesting bit. The rest I think we more or less agree on, but I don't think you can solve this because you don't know what the function is going to do. For example lets consider shapes, and I have circle, square etc. Each has a different property and we want to calculate the area of any shape:

fn Area(x : Shape) {
    match x {
        Circle(y) => 2.0 * PI * y.radius,
        Square(y) => y.size * y.size,
        RightTriangle(y) => y.base * y.height / 2.0
    }
}

Now we extend shape with a Rectangle, you will need to find every part of the code that matches on a Shape and add a case for Rectangle, or risk a crash. By forcing you to add "Rectangle" to the enum definition the compiler can check that every match statement covers every case, which is much better. Note how the locality changes with a trait:

trait Area {
    get_area(&self) -> usize;
}

impl Area for Circle {
    get_area(&self) -> usize {
        self.radius *2 * PI
    }
}

Here we can add the definition for each type, but the set of functions you are expected to implement are fixed in the interface. If you add a new function to the interface the compiler can check all the implementations and make sure they implement all the required functions.

Notice how in both cases the compiler is able to ensure the code is correct by making sure you have all the types covered in a match, or all the functions implemented in an impl.

If you offer extensibility in both directions, the compiler will not be able to help you make sure anything is fully implemented, and you will start getting lots of problems. You can fix the interface (functions you are expected to implement) and allow new types to be added, or you can fix the types, and allow new functions to be added, but to do both would be unsafe.