Sorted-iter: provides set and relational operations for sorted iterators

sorted-iter provides set and relational operations for all iterators in the standard library that are known at compile time to be sorted.

Set operations example

For iterators that are known to be sorted by item, it provides set operations that itself return iterators:

use sorted_iter::SortedIterator;

// the iterator returned by BTreeSet::into_iter is known at compile time to be sorted by item
let primes = btreeset! { 2, 3, 5, 7, 11, 13u64 }.into_iter();
let fibs = btreeset! { 1, 2, 3, 5, 8, 13u64 }.into_iter();
// compute the intersection of the two sorted iterators, as another iterator
let fib_primes = primes.intersection(fibs);

Relational operations example

For iterators of pairs that are known to be sorted by key, it provides relational operations that itself return iterators:

use sorted_iter::SortedPairIterator;

// the iterator returned by BTreeMap::into_iter is known at compile time to be sorted by key
let cities = btreemap! {
  1 => "New York",
  2 => "Tokyo",
  3u8 => "Berlin"
}.into_iter();
let countries = btreemap! {
  1 => "USA",
  2 => "Japan",
  3u8 => "Germany"
}.into_iter();
// compute an inner join of cities and countries, returning an iterator
let cities_and_countries = cities.join(countries);

Combinators that preserve sort order are properly considered. E.g. filtering a sorted iterator is guaranteed to again yield a sorted iterator.

I hope somebody will find this useful, either for the functionality it provides or as a complex example for defining extension traits.

Cheers,

RĂĽdiger

2 Likes

It looks like Join doesn’t require the source records to be clonable. How does it handle multiple records with the same key coming out of the iterators? That should produce a cross-product.

I should probably clarify that in the docs. With sorted I mean strictly sorted. So an iterator that is marked with SortedByItem is supposed to be strictly monotonously increasing when looking at the item Ord. And an iterator of pairs that is marked with SortedByKey should have strictly monotonously increasing keys.

That is what you get from most ordered collections, ranges, etc.

It could be done multi-set style, where [1, 2, 2, 2, 3, 3] join [1, 1, 2, 2, 3] would give [(1, 1), (2, 2), (2, 2), (3, 3)]

That’s not the join from relational algebra, though. The problem is more relevant if there’s a value associated with the key:

[(1, 'a'), (2, 'b'), (2, 'c')] join [(2, 'x'), (2, 'y'), (2, 'z')] should produce 6 results:

[(2,'b','x'),
 (2,'b','y'),
 (2,'b','z'),
 (2,'c','x'),
 (2,'c','y'),
 (2,'c','z')]

Or similar, depending on how the individual results are constructed. It sounds like these lists wouldn’t be eligible to implement SortedByKey, which is certainly a correct solution.

On the other hand, relational algebra also says the join key should be the set of all columns that appear in both source relations, so there’s going to be some impedance mismatch in any case— those concepts don’t really exist in Rust’s type system.

1 Like

Yes, I mainly wanted to do something useful for the various sorted iterators in the std lib, and most of them are strictly sorted by key or item. Basically for SortedByKey, the key has to be sorted and unique, like a primary key.

1 Like

That makes a lot of sense, but it’s rare to want two join two tables by matching their primary keys together: usually it’s the primary key in one table and a foreign key column in the other.

I was mostly curious because I’m in the middle of building full-fledged relations out of nested standard collections, and was curious how you’d dealt with some of the problems.

1 Like