Most coveted Rust features

There is some work on ways to iterate collections that might be relevant to the discussion:
http://okmij.org/ftp/papers/LL3-collections-enumerators.txt

1 Like

The goal is attaining the "Reuse" Augustss prefers about lazy evaluation, while not incurring the negative side-effects (i.e. resource and timing indeterminism) of lazy evaluation and also not forsaking encapsulation of resource lifetime management that imperative spaghetti such as unstructured use of iterators ostensibly might entail.

In Haskell we can write:

map :: (a -> b) -> [a] -> [b]

some :: (Maybe m) => (a -> m b) -> [a] -> [b]

f :: a -> b

g :: (Maybe m) => a -> m b

let x = some g (map f 0::1)

We can I suppose write that in Rust roughly as follows:

fn map<A,B,C:Collection>(fn(A) -> B, C<A>) -> C<B>;
fn some<A,B,C:Collection>(fn(A) -> Option<B>, C<A>) -> C<B>;
fn f<A, B>(A) -> B;
fn g<A,B>(A) -> Option<B>;
let x = some(g, map(f, {0, 1}));

Since Rust employs eager evaluation strategy, map calls f for every element before some is called; thus if some terminates (when g returns false) before all the elements have been enumerated, Rust has wasted some computation. Whereas, since Haskell employs lazy evaluation strategy, f is only called for those elements which some enumerates.

We need to employ inversion-of-control in an eager language so that the called functions can call and enumerate their callers. But this isn't straightforward because the generalized structure and composability of Collection semantics must be respected.

The most general forward enumeration function is a fold where B is the current value of the result of the fold and the Option can terminate the enumeration early:

fn fold<A,B>(Enumerable<A>, B, fn(B, A) -> Option<B>) -> B;

The functions map and some can be built employing fold.

Afaics, it would be impossible to build our design on the general fold without coroutines or generators, because there is no guarantee that the incremental partial orders of its interim result type can be enumerated incrementally, i.e. there are no assumptions that can be made about the semantics of the unbounded generic result type B.

Afaics, if we were to instead build our design for inversion-of-control on fn map<A,B,M:Monoid+Enumerable>(M<A>, fn(A) -> Option<B>) -> M<B> where the incremental consistency is honored and M is consistent across the functional composition, then the optimization arises of ignoring the structure of the result type, compose the callers' chain of fn(A) -> Option<B> operations, enumerate the input Enumerable (aka Iterable) on those composed functions, and employ Monoid::append to construct the result. In other words, simply call map with the composed function.

For more general case where M is not consistent but the enumeration can be modeled as an incremental cursor (see the aforementioned quoted reference), we only need a control function with that has a variadic input taking an Enumerable+Monoid paired with each fn(A) -> Option<B> operation in the composition. This control function can encapsulate all the resource lifetime management orthogonal to instance and compositions. I don't see any reason we need coroutines or generators in this case. The instance of the Enumerator (aka Iterator) maintains the necessary state.

For more general case where the enumeration can't be modeled as an incremental cursor (see the aforementioned quoted reference), we need to allow the Enumerable:fold method decide when and how many times it will invoke the fn(B, A) -> Option<B> operation so it can then yield to its caller, who can continue its Enumerable:fold. So the input to fold requires a Yieldable interface:

fn fold<A,B:Yieldable>(Enumerable<A>, B, fn(B, A) -> Option<B>) -> B;

This Yieldable interface should be implemented with stackless coroutines, which in this special case can be implemented with a function call that inputs and outputs its internal state so each yield (return from the function) can be restarted (by calling the function inputting the last output state). Otherwise known as generators.

The fn(B, A) -> Option<B> operation must be a pure function so that it can't leak resources to subvert the encapsulation of the resource lifetimes.

I believe this is more and less the rough outline at least to my current understanding of this issue.

Edit: the fn(B, A) -> Option<B> operation does not have complete control over its efficiency w.r.t. to how many successive times it will be invoked relative to the yield. It can't cache and queue expensive operations in B because it can't be sure it will be invoked again. Some improvement in this design may be needed.

Edit#2: I don't think higher-kinded types are required by any of the functions mentioned above. They could possibly required for iterators.

I don't see how they are requires for iterators at all (they are not). The one use case for lazy streams is because of Rust lifetimes and is not a normal issue with parametric types. It turns out you would like to have an associated type with a parametric lifetime, which means the associated type itself is higher kinded (lifetime -> type) but no higher kinded types are needed in function signatures. As you are not interested in lifetimes, you should just assume HKT are not needed for Iterators (I don't think C++ has HKT, so iterators don't require them).

Agreed. I was having a similar realization (as I was sleeping) that the only reason we need HKT is for compiler checked lifetimes.

And I add the thought that the we don't need the compiler to check lifetimes where we are using proper high-level reasoning to encapsulate lifetimes, e.g. the inversion-of-control in my prior post using stackless coroutines so the lifetime is encapsulated in the Enumerable::fold instance and then enforcement that fn(B, A) -> Option<B> is a pure function so it can't leak the resource lifetime.

I think you mean that the reason we need to have lifetimes is because there is no garbage collector. It was realised early on that Lisp needed a GC to implement functions with closures properly, and every functional language since has required a GC. Rust is a bit of a breakthrough as it brings features normally associated with functional languages to a language with no GC and zero runtime requirements (there is no runtime library that must be linked with the produced code). Haskell's pure functions rely on GC to not leak memory.

The only way to avoid GC in a functional language without lifetimes, is to limit yourself to stack only data, and no heap data that can live on outside its lexical scope.

GC alone is not sufficient. It is still possible to leak resources that can prevent GC.

And encapsulating lifetimes perhaps can enable manual resource deallocation in some scenarios perhaps (but I need to think about that more deeply). So perhaps the GC choice is orthogonal to my point.

Edit: I am really building on my intuition that compiler checked lifetimes are an unnecessary low-level pita that infects the type system deleteriously, or at least should not be the default in the language. Apologies to Rust fans not trying to disparage Rust, just expressing my opinion but still open minded to learning if I am correct or not.

Oleg implemented the enumerator in Scheme which is garbage collected. Without linear types there is not really much chance to avoid GC.

The enumerator is very limited and only deals with one access pattern, the basic Iterator, and is less generic. It does not even cope with a copyable multi-pass iterator like ForwardIterator, which you can see being used here to find the partition point by bisection. Try that with an enumerator:

    pub fn partition_point_n<I, P>(mut f : I, mut n : I::distance_type, mut p : P) -> I
    where I : ForwardIterator + Readable, P : FnMut(&I::value_type) -> bool {
        // Precondition: readable_counted_range(f, n) && partitioned_n(f, n, p)
        while !n.is_zero() {
            let h : I::distance_type = n.clone().half_nonnegative();
            let m : I = f.clone() + h.clone();
            if p(m.source()) {
                n = h;
            } else {
                n = n - h.successor();
                f = m.successor();
            }
        }
        f
    }

Note: the "clones" are only necessary because I have deliberately not derived "Copy" for these types, as I want to see the cost of copying in the code. As these types are not copyable, this means this algorithm uses no copying, not even processing arguments of functions except where there is an explicit clone, but that is tangential to this discussion.

Note that Rust's standard library offers an Iterator trait that is implemented by lots of structs and default-implements a lot of lazily-evaluating functions such as map, fold, etc.

Just to clarify, Iterator::map creates a new map adaptor around the original iterator, which is lazy, while fold eagerly consumes the whole iterator.

You're right, of course. I should read the documentation I link more carefully before linking it. :slight_smile:

...

When you say Generics over Integers, do you mean

struct Blob<S: u32> {
    p: [u8; s]
}

Because this is something I would like.

I have decided we need higher-kinded types:

I'll have to come back to this later, bcz I need to sleep, but just real quick I will posit that I think if the callback function is pure, then the enumerable doesn't need linear typing to know it can deallocate the resources.

Back to the original topic, I think a great feature would be faster/incremental compilation, which looks like is going to happen thanks to the development of MIR. I would also appreciate smarter borrow checking, looking forward to when this will be available! Finally, after reading the debate around HKT, I would just like to ask: please keep Rust simple; don't turn it into Haskell :blush:

3 Likes

I think this will be THE hardest problem to solve. Abstractions allow you to write more generic code e.g. to use Streaming Iterators interchangeably you need more abstractions and more abstractions lead to difficult to understand code.

Without necessary abstractions there will be macro hammer (or code generation) that is also not simple.

Rust code may often be written by complicated non-KISS programmers, so the code may be complicated and abstracty in any case, regardless of support of features by the language.

If there will be some special, complexity-limiting "rusty coloured" HKTs, it may be a win.

Async, await, and yield.

I would very much want to have async and await, for asynchronous programming. I would also like yield, to make iterators easier to write.

@keean, @matt2xu, @Ygg01, @vi0, I was thinking about responding point-by-point, but then I thought that creating a balance between expression and simplicity is really an artform, so we could go back and forth covering all the nuances and details, but it probably still wouldn't be conveyed well. A written forum just isn't the efficient way to hammer out such complexity of design. Really the only way to do it is a small very focused and highly experienced set of designers/developers who have learned about all the pitfalls and benefits of other languages and are capable of doing the holistic design. Holistic artform design is one case where IMO (and in some others' opinion) open source development doesn't work well.

I will just note if we wanted Rust to be really simple like Python or Javascript, we'd probably not have low-level borrowing checking and syntax growing tentacles into the userland code, we'd have a GC, and we'd forsake iterators for more encapsulated enumeration (at the cost of losing the ability to abstract some algorithms). This was being discussed in detail in the other thread. But I understand based on feedback I've received in this forum and on Rust's current design, that marrying the simplicity of Javascript or Python with static typing is not the design goal of Rust and so I defer to @dobenour 's post in the "Rust as a Higher Lever Language" thread which I will respond to over there.

Rust and Haskell both have typeclasses and afaics they both need HKTs. Arguably what makes Haskell so obtuse are:

  1. Uses spaces instead of commas and parenthetic grouping for function application.
  2. Has global type inference so the function, data type, and typeclass (aka trait) signatures are rarely displayed explicity.
  3. Is 100% immutable, thus requires category theory monads to emulate mutability.
  4. Has lazy evaluation and memoization which complicates memory space and timing design and analysis.
  5. Abstract data types (ADTs) are more powerful (in some ways and less in others) and less verbose boilerplate to accomplish the same employing struct, trait, mod, and enum. This conciseness and lack of verbiage, along with being an unfamiliar declaration to C/C++/Java programmers, makes it obtuse to some.
  6. Etc.

My point is that Rust has many of the same roughly equivalent abstractions that Haskell has, and will probably need HKTs as well. But it doesn't necessarily have to be as obtuse to C/C++/Java programmers as Haskell is.