Does rust really need higher kinded types?


#63

One of the good things about rust is the ability to compile without the standard library, as it enables people to experiment with other styles, keeping the base language. In Stepanov’s “Elements of Programming” he goes through what are properly called coordinate structures, but got renamed iterators buy the C++ standard library people, and that kind of stuck, starting with the single pass forward iterator ‘Iterator’ and moving on to multi-pass forward iterators ‘ForwardIterator’, BidirectionalIterator, IndexedIterator, RandomIterator, and then more complex one like the BifurcateCoordinate (it iterator for binary trees) and multi-dimensional iterators. On of the interesting things Stepanov has done is separate the dereference traits (Readable, Writable, Mutable) from the iterator itself. You can see how far I have got porting this to Rust here. My plan is to first translate the code as it is, and then look at how it might be improved to take advantage of Rust’s specific features later. I am interested in general comments and style tips etc. I haven’t needed HKT yet.


#64

Robert Harper seems to think Haskell’s monad approach to exceptions is “unsafe”. Reactions here and here. I haven’t studied the issue yet.

So HKT for iterators are only necessary because of Rust’s default resource lifetime tracking. I think @keean had also pointed that out.


#65

I should unpack my mental model that I implied upthread:

  1. That when borrowing a mutable references, iterators do not enscapsulate resource lifetimes. An enumerable could leak resource lifetimes also if the callback function isn’t pure (i.e. if it is a mutable reference). But they are not equivalent categories, because the iterator can be used in many more varied imperative coding scenarios where it can leak, so the compiler can’t make a simple check on whether the callback function is pure because when using iterators there is no callback function (i.e. no constraint requiring functional programming) instead imperative spaghetti.

  2. Afaik, your trait Iterator does not constrain the implementation from borrowing a mutable reference from the collection. There is nothing in the trait’s signature for the consumer of the iterator to enforce at compile-time whether the iterator is leaking resources lifetimes other than to assume it is employing Rust’s resource lifetime checking. For example, if we have an iterator instance and then we also add, delete, and rearrange the order of items in the collection, that iterator has to be updated by the collection, thus the collection has to allocate some resource such as storing a callback function.

Well yeah why should it be surprising that an imperative spaghetti requires a low-level tsuris for resource lifetime checking.

Edit: Afaics, the functional programming alternative to the imperative spaghetti of iterators and so as to be able to implement all the algorithms @keean refers to from Essentials of Programming, is to have a separate trait for each such algorithm. While one might implement these traits with iterators, collections can also be optimized to implement some of them natively and with encapsulated resource lifetimes. So then @keean gets what he wants and I get what I want. Win win. And we then discourage most users from dealing with the low-level tsuris of iterators and resource lifetimes.


#66

Sounds like that requires multiple mutable aliases to the same location in memory, which of course, Rust prevents.


#67

In general, there are a lot of differences in efficiencies and ergonomics when comparing analogous interfaces between Rust and Scheme - given that the former has linear types, mutability, and (importantly) guaranteed eager invocation of destructors, but lacks garbage collection or even tail call elimination. Certainly Kiselyov’s proposed construction could not work without the last of those.

But it’s not just that - some of the statements in that paper are either highly Scheme/dynamic-type specific without mentioning it (despite the fact that the paper also brings up Haskell), or just inaccurate:

Another problem of cursors is telling the user that there are no more
values to access. When a collection is exhausted, cursor operations
typically return an “out-of-band” value such as EOF (cf. getc() in C)
or raise an exception (cf. iterator.next() in Python). None of these
approaches appear entirely satisfactory. Unlike cursors, an enumerator
does not need to do anything special to tell the iteratee that the
traversal is finished: the enumerator will simply return. There is no
longer a problem with out-of-band values and user’s disregarding such
values.

This is completely solved by Option (aka Maybe), which Rust is hardly the first language to feature.

Cursors are simply less efficient. A cursor must maintain the state of
the traversal. That state may be invalid. Each operation on a cursor,
therefore, must first verify that the cursor is in the valid
state. The cost of the checks quickly adds up.

With encapsulation there is no need to check for invalid states.

This issue of the inefficiency of cursors is well known. For example,
it is highly advisable to use “native” operations to move large
sections of an array (ArrayCopy in Java) rather than copy
element-by-element. The cost of an array bound check on each access to
an element is one of the considerations.

This conflates two issues: the cost of bounds checks, and the advantage of using something that lowers to memcpy (typically implemented in assembly with vectorized code) over manual copying. While it says that bounds checks are only “one of the considerations”, the latter has nothing to do with the choice of iteration mechanism, except insofar as a sufficiently powerful optimizer (like LLVM’s) can automatically transform suitable manual copy loops to memcpy, which is possible for both styles.

It is often said that the key to the best performance is to
do more on the server. To find the maximum value of a database table
column, it’s far faster to evaluate

   select MAX(col) from collection

than to open a cursor on the collection and retrieve all values from
’col’ searching for the maximum. Stored procedure languages (some of
which are quite sophisticated) were introduced to make the server-side
processing easier and powerful. We should stress that

select MAX(col) from collection

is precisely equivalent to

foldl max lower_bound collection

It is certainly not equivalent: if there is a suitable index on col, select MAX(col) from collection is a quick O(log n) tree traversal, if not O(1), rendering the comparison nonsensical. If there is no index, you end up with a complex comparison between a dynamically optimized (but typically interpreted rather than compiled) query engine within the SQL server, and an IPC mechanism (cursors) with relatively high latency, where in both cases the process will be I/O bound most of the time.

This has nothing whatsover to do with microoptimization of an in-process, typically memory- or compute-bound iteration interface, possibly subject to a generic optimizer but not to algorithmic special-casing based on the callback function. The factors affecting performance are completely unrelated.

The only connection I can imagine is if Kiselyov is implying (very vaguely) that a hypothetical database system could, in fact, sort of merge the client language with stored procedure languages, and do things like introspect on the callback, identify common functions, even automatically send snippets of the user’s code to the server to execute there. If so, that’s a fairly special case, which could justify use of enumerators, but is unlikely to ever apply to statically compiled languages like Rust that tend to value explicitness over magic. I haven’t heard of an actual system doing this in any language.


#68

Remember afaik, the Rust lifetimes aren’t a magic bullet and the programmer has to remember to enforce them. I am positing that the signature of the trait Iterator obfuscates whether lifetimes are being enforced by the implementation given that I posit that the collection has to maintain a mutable reference to the iterator and when the iterator is a mutable reference of the collection, then the mutation of the collection can’t occur by another code path. So either we have imperative spaghetti gridlock (if the lifetimes are strictly enforced) or we don’t have resource checking. Either way it appears to be a mess that is best to not subject naive programmers to and should be cordoned off for the expert (e.g. library) programmers.

P.S. readers please note my edit at the bottom of my prior post pontificating on a “win-win” suggestion.


#69

You may be misunderstanding that post: only unsafe code has to remember to enforce lifetimes. Safe code will not compile if it violates lifetime rules. (More precisely, references in unsafe code are subject to the same rules as in safe code, but there are other tools available to unsafe code such as raw pointers, transmute, etc. that can violate the rules.)

The collection doesn’t have to maintain a reference to the iterator; usually it’s the other way around.


#70

Are you sure? Please re-read my reasoning. How can the collection be changed and not tell the iterator that some items were inserted before its current position without violating the invariants of an iterator?

Mutability and iteration is a can-of-worms.


#71

I don’t think lifetimes are a magic bullet and didn’t imply they were. Nevertheless, Rust disallows aliasing mutable memory in safe code, so your scenario can’t happen in Rust today, plain and simple.


#72

In general, the collection can’t be changed, as that requires an &mut reference to it, which cannot be taken while the iterator already has a (mutable or immutable) reference. This is statically verified by the compiler. A collection specifically designed to allow concurrent mutation could have a different API (based on RefCell etc.) that might have those complications, but typical Rust containers just forbid it.


#73

Preventing iterator invalidation through mutation is one of the first things that any Rust programmer will claim as a benefit over C++.


#74

And that was my point of saying that if it is strictly enforced, then we have imperative gridlock.

For example, we can’t implement certain imperative algorithms that require us to have more than one cursor simultaneously. We can’t support asynchronously simulated or actual multithreaded access, unless we model some sort of queue for mutable reference requests. The low-level imperative tsuris infects everything until we end up with a 1500 page library manual like C++. An example of insanity is repeatedly banging our head on the wall and not realizing why it hurts.

Until your library manual covering all the complexity and corner cases is the same size as C++'s manual. I’d rather argue to go around the imperative spaghetti wall, at least for the naive programmers who want to be up and running without 2 years of study.


#75

It is strictly enforced, today (though it is difficult to compare manual sizes). ‘Gridlock’ is indeed a common complaint about Rust, and quite possibly its biggest weakness as a language design, but there are mechanisms to work around it. Immutability is not generally a workable solution because it almost always sacrifices performance, whereas Rust’s raison d’etre is to guarantee safety without compromising on performance compared to C (like almost every other language does).

This is another case where you would be well served actually trying Rust before complaining about it.


#76

Even if Rust’s manual is twice the size of C++'s, my statement is still true.


#77

There is no such thing as not compromising. Something is always comprised and we need to compare the relative value of what is compromised. I proposed a “win-win” where we get the same performance, but shield perhaps 80% of the programmers from the low-level gridlock and tsuris. If achieving “win-win” is not our focus here, then I have no counter-argument other than repeat “C is manly and Ruby is for noobs” and the statistics that I linked upthread showing that C++ takes 2 years to learn and Ruby 3 months.

IMO, I would not be well served to dive into imperative spaghetti. Been. There. Done. That. A 100,000+ lines of code ago.

Opinions are opinions. Experience is experience. Factual elucidation is factual. I prioritize the latter two since they are more objective.

That is the point of “C is manly and Ruby is for noobs”.


#78

Do you have any benchmarks? Without benchmarks, especially with a claim of this magnitude, it’s hard to see whether it will work or not.


#79

No it’s not.


#80

Yes it is. They are making the point that what ever the culture that is preached, then it becomes a self-fulfilling prophecy instead of an objective process of what is optimum. Groupthink inertia is a known attribute of design-by-group. I am not trying to change Rust. I am merely trying to analyze what is and possibly offer “win-win” suggestions that can work from where Rust is now, not changing what is already water-under-the-bridge.


Design patterns for composability with traits (i.e. typeclasses)?
#81

There’s zero groupthink in claiming that preventing iterator invalidation at compile time is a benefit over C++.

I said nothing about whether such a benefit is worth all costs.


#82

Actually the performance would be greater with my proposal.

We should still have enumerables for the immutable case of algorithms that fit the enumerable model, as I already explained that they offer the collection implementer more information to optimize with.

You do not understand what I proposed. I proposed that we expose traits for algorithms that programmers need to implement that would normally require iterators since they can’t be modeled with an enumerable.

These traits could be implemented by experts employing iterators or even in some cases natively implemented by the collection implementer, so then we get the best of both worlds. And greater performance all around. :wink:

No need for benchmarks since the worst case implementation in my proposal is still iterators. So we are sure to get at least the same performance, and in some cases better performance when there is a native implementation or the algorithm model fits the enumerable model.


The groupthink is extrapolating that correct sentence into arguing that the guy who doesn’t want imperative spaghetti has nothing important to say and that if he doesn’t show code and a benchmark, then he isn’t worthy.

And also extrapolating that correct sentence into a myopia that blinds to “win-win” designs that weren’t considered because of the obsessive focus on that correct sentence.