Upcast to impl Iterator

Well, I tried the benchmarking using this:

fn iter_flatmap_call(bench:&mut Bencher){
    let hs = HashSet::new();
    let ohs:Option<&HashSet<i64>> = Some(&hs);
    bench.iter(|| {
        get_iter_flat_map(&ohs);
    })
}

fn iter_flatmap_call_and_use(bench:&mut Bencher){
    let big_hs = big_hs();
    let ohs = Some(&big_hs);
    bench.iter(|| {
        for i in get_iter_flat_map(&ohs) {
            let _ = i+1;
        }
    })
}

And the results:

test iter_dyn_call             ... bench:          18 ns/iter (+/- 2)
test iter_dyn_call_and_use     ... bench:       3,747 ns/iter (+/- 516)
test iter_either_call          ... bench:           8 ns/iter (+/- 1)
test iter_either_call_and_use  ... bench:       1,990 ns/iter (+/- 210)
test iter_enum_call            ... bench:           8 ns/iter (+/- 2)
test iter_enum_call_and_use    ... bench:       2,026 ns/iter (+/- 287)
test iter_flatmap_call         ... bench:           1 ns/iter (+/- 0)
test iter_flatmap_call_and_use ... bench:       1,684 ns/iter (+/- 218)

And full code.

So, there is a difference, and Boxing is the slowest both in terms of
running the method and subsequent access of the iterater. It’s a
2-fold difference though. The flatmap solution is fastest,
especially for creating the Iterator.

@dodomorandi Yes, you are correct, consuming the Option solves the problem. Thanks!

Also worth noting that Option<&T> is Copy, so the caller can still use the original Option, and it’s the same size as &Option<&T>, but requires less indirection, so passing by-value is actually cheaper. Win-win-win.

(Edit: Actually, the performance difference appears to be optimized away. Point stands.)

1 Like

The Either crate solution works, but inserts a match for every Deref call which means before every next as far as I can see. The flat_map solution does the same thing, as part of the internal calls it makes to the contained Iterator .

Now…

This all depends on how you use the iterator. If you call any method that does internal iteration, the Iterator implementation can override the method to make the overhead negligible.

All of the following methods can be optimized in such a manner:

  any
  all
  count
  find
  {,try_}fold
  {,try_}for_each
  last
  {min,max}{,_by,_by_key}
  nth
  partition
  {,r}position
  product
  sum
  unzip

Overriding just fold (or I guess you should override try_fold as of 1.27) will automatically optimize the vast majority of these, which use it in their default implementation.

2 Likes

Note that this problem – and solution – with Either is the same as the problem with Chain, which is why for_each is faster than for for Chain.

This is sadly not yet possible on stable, as doing so requires mentioning an unstable trait.

But yes, overriding try_fold will automatically improve all* of those. fold doesn’t help for short-circuiting things like any/all/find.

*well, except nth, right now, because it doesn’t require Sized like the others and thus needs specialization to do in terms of try_fold

1 Like

eh? Can’t it just do by_ref?

Edit: Forget that, even. It takes &mut self. It should be able to call try_fold!

Edit 2: brb, gonna try implementing it since the only correct answer to this kind of question is “try it and see” :stuck_out_tongue:

Okay, okay. Results are in.

Of course, @scottmcm is right. nth can’t use try_fold.

  • A small hiccup: try_fold takes &mut self with Self: Sized. So at best we need &mut &mut Self.
  • Another hiccup: by_ref takes Self: Sized, so you can’t use it. (however, you can just mutably borrow self)
  • After clearing those hiccups, it compiles, but does not forward to the optimized try_fold of the underlying iterator. This is because impl Iterator for &'a mut T does not forward the try_fold method, because it can’t, because the impl is for T: ?Sized. :stuck_out_tongue:

An obvious corollary to this is that by_ref (or taking &mut iter) destroys virtually all of the fold optimizations!

(…why does try_fold take Self: Sized, anyways?)

It doesn’t need it, but without it then Iterator becomes not dyn-capable (formerly object-safe), which would be a breaking change.

Oh, right, I was only thinking of self args and forgot that generics aren’t object-safe either.