Does rust really need higher kinded types?

On "Traits for Collections" I had a deep link to a comment by @BurntSushi. I had said:

The trait Contains from the alternatives section is a good idea. Range should implement a Contains trait and maybe get containers to implement it too. This would be part of work to get more traits into std so programs can rely on those without worrying about where the implementation is coming from.

And he replied:

@ehiggs Ever since collections reform, there's been a sort-of moratorium on traits for collections. The primary reasoning, if I remember correctly, was to hold off until HKT. If this RFC were to move forward, I would expect contains to be an inherent method just like it is on the various containers.

Regarding try/catch: here is a link. A considerable amount of the conversation is that it shouldn't be implemented becase there may be a do notation in the future which will elegantly handle the semantics of catch.

So I don't think algorithms (which contains is) belong in collections as you end up repeating yourself. I think algorithms are best implemented as free-standing generic functions. I might be wrong as its not clear what contains is supposed to do on a collection, but it sounds like this generic algorithm from Elements of Programming:

    pub fn find<I>(mut f : I, l : &I, x : I::value_type) -> I
    where I : Iterator + Readable {
        // Precondition: readable_bounded_range(f, l)
        while (f != *l) && (*(f.source()) != x) {
            f = f.successor();
        }
        f
    } 

This single definition is all that is necessary as it is generic over all containers (by merit of being implemented on iterators) and all content types.

That algorithm might 'work' but it will be completely against the whole point of any tree-like structures. The Contains trait would be useful for ranges, polygons, higher dimensional spaces, points in space + distances (hyperballs?).

Agreed, thats why it is defined on 'Iterators'. Note you would need a different algorithm for each of those other things. A binary tree for example should provide a BifurcatingIterator, and the algorithm for finding if a value is 'below' a node would be different. A higher dimensional space would provide a coordinate-structure, which is essentially a higher-dimensional iterator.

The whole idea of generic programming is to start with the algorithms, and then classify them by their data access patterns (iterators). Then containers can provide the correct iterators.

You can then use trait overloading (where the overload depends on the traits implemented) to get the compiler to choose the correct algorithm for the type of iterators supported by your container.

Edit: this is how generic programming solves the inheritance problem. If you implement find/contains on the collection itself then you will repeat yourself a lot, implementing very similar functions on similar containers. We know that some containers should share the definition and some should not. By defining the algorithm on iterators, we can share the algorithm between all containers that provide the right type of iterator (in the above case a single pass forward iterator). We can then provide other implementations of find/contains that operate on other kinds of iterator, like the BifurcatingIterator I mentioned or N-dimensional coordinate iterators. If you do this you will be able to call the same function on any sub-range, or sub-space of the container for free, and pass the results of one find into the next, for example find element "A" in a vector, then find element "B", and then find element "C" in between them. The correct algorithm for the iterator type gets selected by the type of the iterator, and multiple different containers can share the same iterator. Even better different algorithms can abstract over different iterators, so the implementations can be shared between different containers, one algorithm shared between collection types A and B and another algorithm shared between collection types B and C. Try doing that with inheritance (you would need multiple inheritance and each algorithm in a separate superclass).

The rebuttal is here.

I am still arguing that iterators are an anti-pattern.

I realize most here probably don't agree with me (yet). But I am confident eventually I will be proven correct on that. Or I will learn I am not, but my abstract intuition usually leads me to the correct answer especially on an abstraction issue. Devil is in the details of course.

A Contains trait doesn't bind the collection type to the Contains methodology. The choices of which traits a collection implements is open to extension in the Expression Problem. And this is why we need HKT in general (not just for my proposal), because we need to not throw away data types and subsume them to trait types as I explained in the thread for my proposal.

HKT are fundamental. We must have them.

An iterator is a point sampler. An enumeration operates on a range is more high-level and thus has more information to optimize with and apparently then you can invert the control, so that the resource lifetime is encapsulated in the enumerable methods, not needing to use low-level tsuris of Rust lifetime typing. I may be mistaken, but that is the intuition I have thus far and the direction I will initially be pursuing.

It is the old "we just need a smarter compiler" argument that functional enthusiasts have been making for years, and yet where is the mythical magical compiler that can optimise bad code to be fast? Where are all the functional languages beating 'C' on the benchmark game. The only reason Rust is competitive with C/C++ is because it abandons that argument and gives you the low level control to write efficient code. On top of that it gives you traits and monomorphisation which give you zero-cost-abstraction. Great! So looking at actual languages and implementations the enumerator approach is going in the wrong direction towards the fantasy land of the "smarter compiler". Deforestation does not make Haskell faster than 'C'. Look at the empirical evidence, not the theoretical speculation.

I read your post. I'm not smart enough to talk about this problem in the abstract. I need real Rust code.

3 Likes

That is a category error in your conceptualization of what I wrote.

I am saying that iterator doesn't provide enough information for algorithms the programmer writes on top of them to be fully optimized by the programmer. Has nothing to do with the compiler's optimization.

Well thats a good thing :slight_smile: In that case you can find my implementation (so far) of the algorithms from Elements of Programming here: https://github.com/keean/elements_in_rust If you can improve on the performance of any of them I would be very interested to know about it (and so I would think would be Alexander Stepanov).

My advice would be you need to look at a lot of different algorithms before you start trying to abstract patterns from them.

A lot of this may be true but it's getting away from my point. Or, I suppose it's supporting my point. The point was that I think there should be more agreed upon traits that can be used for various types to implement but HKT is mysteriously blocking it.

Using these common traits as a protocol, we can actually use the genericity that people have worked so hard to develop. For example, I have made a shonky Generalised Search Tree. The GeoRust team has made rust-geo. If the rust-geo package implements std::traits::Contains on their various polygon types as well as various intersection and union functions, then they could use the GST to store them with little effort.

If we don't have these agreed upon types (in an agreed upon place; i.e. std namespace) then we end up with a Tower of Babel of types where rust-geo implements their own intersection and unions and there needs to be a bridge type that converts it to GST intersections and unions. It's more of a social problem rather than a technical one.

The starting point should be algorithms and not collections, so what might a good algorithm be to look at, perhaps how a point can be inside or outside an arbitrary polygon. We might start with an analysis like this: Point in Polygon Strategies

A point for the purpose of generality can probably be just a tuple. Looking at these algorithms some of them access the points in the polygon sequentially, so these should be implemented using a single pass forward iterator. Lets focus on these algorithms, maybe the cross product sign algorithm. So this algorithm can be implemented independently of the collection type used.

Anyway my point is the categorisation should start from the algorithms, not from the collections, and polygon_contains_point is a very different algorithm from vector_contains_value, and is another layer of abstraction higher, as it involves knowing what the values in a container represent, as opposed to a simple search for a value.

Stepanov puts it much better than I can:

It is as if mathematicians would start with axioms. You do not start with axioms - you start with proofs. Only when you have found a bunch of related proofs, can you come up with axioms. You end with axioms. The same thing is true in programming: you have to start with interesting algorithms. Only when you understand them well, can you come up with an interface that will let them work.

I was rushing out the door on the prior reply, so I wasn't able to fully explain. I will do so now.

My upthread post, linked 3 times to the pertinent post in the other thread. In that pertinent post, I pointed out that Oleg had pointed out that iterators (afaics he refers to them as 'cursors') are unable to model some general invariants such as for example when the collection is a remote database. Additionally, Oleg points out that iterators can't encapsulate the resource lifetimes that the iterator allocates (and although one could argue that Rust's lifetimes can be applied, I view them as a pita that will make libraries so complex that just like C++ it may likely require 2 years for a programmer to become proficient in complexity and delay the diffusion of adoption to well beyond a decade or more, as compared to managed/functional languages that require only 3 - 6 months).

However there is something inverting the control with iterators can do which an an enumerable can not:

There may be other ways to solve that case without an iterator.

The general abstraction we are trying to model is two distinct algorithms: 1) a data resource; and 2) the consumer of the former. But these two algorithms are not always optimally resonant through any one interface design choice.

So while the iterator may not be strictly an anti-pattern in all cases, it is not the be-all generalized interface for joining two distinct algorithms. Alexander Stepanov seems to have overstated his claims for full generality or you are misinterpreting his claims.

Stepanov says to start with the algorithms. If you look at the algorithms implemented here they are categorised by their access patterns (a.k.a. iterators).

Each algorithm is optimal for the purpose for which it is intended. Notice how many algorithms share the same access pattern. We abstract this access pattern and then require that collections that can provide it efficiently do so, and instantly all those collections can use those algorithms.

Could any of those algorithms be implemented more efficiently with enumerators? Would the code be easier to read and understand, would it allow more general algorithms, would the machine code output be any better?

Oleg argued there are cases that can be more efficient with enumerables.

And again it seems you are ignoring the lack of encapsulation of the resource lifetimes that iterators carry with them?

Edit: appears you were addressing the resource lifetime issue in the other thread:

Iterators do not own the data, they only borrow it, so they don't have any concern with lifetimes. They should not encapsulate lifetime (ownership) that is the responsibility of the collection.

I was perhaps misunderstanding what you mean by an iterator. I thought the iterator can leak mutable references to the items enumerated. Thus I assume an iterator can't borrow.

But I still don't think you are correct in any case, because the iterator can't entirely own its own data structure because if the data resource is mutated by another interface on the data resource, then the iterator must be modified by the data resource.

I can't see how you have a cursor into the data resources without allocating a resource that the iterator can't completely own. And also we need to remember to deallocate the iterator. There appear to be resource lifetime issues.

Try/catch is a control flow feature and it is not waiting on HKTs. There were people who were against it on the ground that they just wanted to have Monad instead.

Traits for collections want HKTs of the same sort as StreamingIterator, so as to define their iter(&'a self) -> Self::Iter<'a>, iter_mut(&'a mut self) -> Self::IterMut<'a>, and drain(&'a self) -> Self::Drain<'a> methods.

Here is the definition of the Iterator trait in Rust, just including the one method that must be implemented (the other methods have default implementations, though they can be overridden if there's a more efficient way to do them; I've elided many irrelevant details below):

pub trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}

Item can be any type; and in particular, over many collections, you can choose to either get an iterator over owned items, immutably borrowed items, or mutably borrowed items. Here's the IntoIterator implementation for Vec that yields owned items:

impl<T> IntoIterator for Vec<T> {
    type Item = T;
    type IntoIter = IntoIter<T>;
    fn into_iter(mut self) -> IntoIter<T> { ... }
}

impl<T> Iterator for IntoIter<T> {
    type Item = T;
    fn next(&mut self) -> Option<T> { ... }
}

There are also the iter and iter_mut method on slices (and you can call on Vec because Vec derefs to a slice) which give you iterators over immutable or mutable references:

fn iter(&self) -> Iter<T> { ... }
fn iter_mut(&mut self) -> IterMut<T> { ... }

Where those types implement the Iterator trait like so:

impl<'a, T> Iterator for Iter<'a, T> {
    type Item = &'a T;
    fn next(&mut self) -> Option<&'a T> { ... }
}
impl<'a, T> Iterator for IterMut<'a, T> {
    type Item = &'a mut T;
    fn next(&mut self) -> Option<&'a mut T> { ... }
}

So, if by "leak a mutable reference", you mean that you can get mutable references during iteration, and save them somewhere else, then yes, you can do that:

fn main() {
    let mut v = vec![1, 2, 3];
    let mut n: Option<&mut i32> = None;
    
    v.iter_mut().map(|x| n = Some(x)).count();
    
    let x = n.unwrap();
    *x += 1;
    println!("{}", x);
}

So as you can see, you can indeed borrow in an iterator.

Actually, Rust used to have what Oleg calls enumerables as its primary form of iteration. In the Rust lexicon, the terminology has generally been to refer to them as "internal" vs. "external" iteration, where "internal" iteration is what Oleg refers to as enumerables, and "external" is what he refers to as cursors (in Rust, there's another concept called Cursors that has been discussed but never implemented, which is like a somewhat more flexible iterator that can go forwards and backwards but then can only act by reference rather than being able to consume the collection like iterators can). You should probably read this thread about the reasons for the switch from internal to external iteration, and in particular this message which explains the reasoning.

It turns out that in Rust, external iteration is a lot easier to make work with the borrow checker, works better with code that does things like early return from loops, and is more efficient and easier to optimize.

The downsides are discussed there too, in particular the greater difficulty of implementing external iteration, as you have to write a struct by hand that encapsulates the state. Python or C# style generators with a yield keyword would make this easier; the idea has been tossed around, some people have experimented with compiler plugins for doing so.

2 Likes

One of the good things about rust is the ability to compile without the standard library, as it enables people to experiment with other styles, keeping the base language. In Stepanov's "Elements of Programming" he goes through what are properly called coordinate structures, but got renamed iterators buy the C++ standard library people, and that kind of stuck, starting with the single pass forward iterator 'Iterator' and moving on to multi-pass forward iterators 'ForwardIterator', BidirectionalIterator, IndexedIterator, RandomIterator, and then more complex one like the BifurcateCoordinate (it iterator for binary trees) and multi-dimensional iterators. On of the interesting things Stepanov has done is separate the dereference traits (Readable, Writable, Mutable) from the iterator itself. You can see how far I have got porting this to Rust here. My plan is to first translate the code as it is, and then look at how it might be improved to take advantage of Rust's specific features later. I am interested in general comments and style tips etc. I haven't needed HKT yet.