Request for review of cleanup & removals in itertools


#1

I’m preparing for the next release series (0.5.x) of itertools.

This is on the road towards an eventual 1.0 version of the library.

See the progress here: https://github.com/bluss/rust-itertools/pull/148

What do you think? We want to increase consistency, keep all the very useful stuff, remove the things that are never used, move the things that are better in other crates.


#2

If we’d collect “utils for slices and string slices”, is there a crate that already aims to do something like that?


#3

I’d like to suggest something “radical”: to write down a quite short list of itertools things that should be moved into the standard library.

I suggest to put something like the V.0.5 format() in that list. I need it often.


#4

I’d like to do that too. Indeed anyone can do it. .join(separator) -> String is the first one that comes to mind, but the fact is that it’s not possible to add it on Iterator (since the trait is in core and String is in libcollections). The libstd could have an iterator extension trait however, that has it.

.format() is great, and that’s why it’s getting a short name. I’d love to add it, the contentious part will be how it’s applying the formatting options multiple times (once per element).


#5

We could add join on the Iterator trait itself by having a FromIterator style helper trait that would be implemented for String in liballoc or wherever it’s defined.


#6

I don’t know if there are such a crate, but it would be great. I have a SliceExt trait implemented in my project with methods such rotate, partition (a la quicksort partition but receives a predicate), partial_ord_sort, etc. Maybe utils for Vec should be include. In my VecExt I have some methods to work in chained way: produce_vec().sorted().deduped().truncate(10).reversed().shrinked_to_fit()


#7

Finding a list everyone agrees on isn’t going to be easy. But I think we can choose the most commonly used things, that have a simple to use API, and that can be added to the std lib without too many difficulties. I have experience regarding what lazy methods are more commonly useful.

This is a first draft of a list. (the “=>” is my suggestion to rename the function for the std library).

The ones I suggest are: cartesian, chunks, combinations, combinations_n, copy_from, dedup, fill, fold1, group_by, Index, join, merge, multi_merge, multizip, permutations, show, sorted, sorted_by, sorted_by_key, step_by, to_vec.

See also:


Macros:

  1. iproduct => cartesian
  2. izip => multizip

Zipping and cartesian product are two common operations to do. Such things also show that there’s some need for variadic generic functions in Rust…


Lazy iterator methods:

  1. step => step_by

step_by is a sufficiently common operation, and it’s easy to use, having it only for numeric intervals is bad.


  1. format_default => show

It’s very handy during debugging, etc.

But show() should take no arguments:

let s = (1 … 5).map(|i| i * i);
println!("{:?}", s.show());

And print:

[1, 4, 9, 16]

Can we add a new formatting syntax to print/println? (Idea stolen from D language):

let s = (1 … 5).map(|i| i * i);
println!("{:{:?; }}", s.show());

Could print:

{1; 4; 9; 16}

So just the .show() adaptor suffices.


  1. join

[“a”, “b”, “c”].iter().join(", ")

It’s a sufficiently common operation, and it’s easy to use. Creating an intermediate vector is wasteful and noisy.

If and when Rust gets default arguments, we can add the join() with a default argument.


  1. fold1

Sufficiently common operation, and quite handy.


  1. collect_vec => to_vec

Common operation. Very easy to use. Quite handy. But I prefer a shorter name.


  1. set_from => copy_from

Sufficiently common operation, handy and easy to use.

An alternative alternative (inverted) API (from D language):

let mut xs = [0; 4];
(1 …).copy_to(&mut xs.iter_mut());
assert_eq!(xs, [1, 2, 3, 4]);


  1. fill

I’d also like something to fill an array:

arr[] = 10; // D language code

Arrays.fill(arr, 10); // Java code

arr[2 … $] = 10; // D code

In Rust you write something like this, that is longer and more bug-prone:

for i in 0 … arr.len() { arr[i] = 10; }
for i in 2 … arr.len() { arr[i] = 10; }

In Rust:

arr[0 …].fill(10);
arr[2 …].fill(10);

That’s similar to:

std::iter::repeat(10).copy_to(&arr[0 …]);
std::iter::repeat(10).copy_to(&arr[2 …]);


  1. Index<Range> for all lazy iterables

It’s a sufficiently common need (similar to islice of Python itertools), but instead of:

let it = iproduct!(0…3, 0…2, 0…2).slice(3 … 5);

I’d like to write:

let it = iproduct!(0…3, 0…2, 0…2)[3 … 5];


  1. dedup

This lazy operation is handy, sufficiently common, and easy to use and remember.

Currently in Itertools this is defined for all iterables, but I’d like it to be defined only for something that is statically known to be sorted. A sorted iterable comes from iterating the result of a sorting, or from a function that verifies (or assumes!) that an iterable is sorted.


  1. sorted
  2. sorted_by
  3. sorted_by_key

Easy to use and understand, handy, and allow to not break the chains of iterators.


  1. combinations
  2. combinations_n
  3. permutations

Handy in lot of situations. Able to shorten your code.


  1. merge
  2. kmerge => multi_merge

As dedup it should only be defined on iterables that are statically known to be sorted…


  1. group_by_lazy => group_by
  2. chunks_lazy => chunks

They are handy and commonly useful, they are present in D language and other languages. But I guess they aren’t easy to add to the stdandard library.


Probably it’s a good idea to add few more function, not currently present in itertools, but the ones listed here cover lot of the missing ground already.


#8

Thank you for the list. It’s longer than I thought a short list would be, but valuable feedback for me as well.

I would like to make the list even more concrete. The things we can add to Rust today, without new features or advanced type system use (sorted things) that has not been proven in itertools or somewhere else.

I must protest:

You’d use slice’s iterator and not a range here in Rust! Like this: for elt in &mut arr[2..] { *elt = 10; }


#9

I think all the items in that list of mine require no changes in the Nightly-version language, right? Most of them come from Itertools after all.

I think this is not too much hard to do/prove :slight_smile:

It means having two functions:
assume_sorted
is_sorted

And a Sorted iterable subtrait. You have sort() and sorted() that return a Sorted iterable. And then you can define binary searches, merge, dedup and few more things only for such sorted iterables.

Right, my experience of Rust is still lacking… Thank you.


#10

Do you mean the changes in println!() for the show()? To design things you need to look a bit far away, otherwise you end taking some very sub-optimal design decisions. Being a bit bold is necessary.


#11

The changes are now in a prerelease itertools 0.5.0-alpha.0. Breaking changes still welcome until the final release.

Current summary:

  • Renamed:

    • combinations to pair_combinations
    • combinations_n to combinations
    • group_by_lazy, chunks_lazy to group_by, chunks
    • Unfold::new to unfold()
    • RepeatCall::new to repeat_call()
    • Zip::new to multizip
    • PutBack::new, PutBackN::new to put_back, put_back_n
    • PutBack::with_value is now a builder setter, not a constructor
    • format to format_with and format_default to format
    • .into_rc() to rciter
    • Partition enum is now Either
  • Module reorganization:

    • All iterator structs are under itertools::structs but also
      reexported to the top level, for backwards compatibility
    • All free functions are reexported at the root, itertools::free will
      be removed in the next version
  • Removed:

    • ZipSlices, use .zip() instead
    • .enumerate_from(), ZipTrusted, due to being unstable
    • .mend_slices(), moved to crate odds
    • Stride, StrideMut, moved to crate odds
    • linspace(), moved to crate itertools-num
    • .sort_by(), use .sorted_by()
    • .is_empty_hint(), use .size_hint()
    • .dropn(), use .dropping()
    • .map_fn(), use .map()
    • .slice(), use .take() / .skip()
    • helper traits in misc
    • new constructors on iterator structs, use Itertools trait or free
      functions instead
    • itertools::size_hint is now private
  • Behaviour changes:

    • format and format_with helpers now panic if you try to format them more
      than once.
    • repeat_call is not double ended anymore
  • New features:

    • tuple flattening iterator is constructible with cons_tuples
    • itertools reexports Either from the either crate. Either<L, R>
      is an iterator when L, R are.
    • MinMaxResult now implements Copy and Clone

#12

Yes, but it’s vastly more productive to be bold outside of libstd, than have to wait for approval of the decision making apparatus. Thus it’s best to prove it outside first IMO (and you get to use it sooner too).