So, there is a difference, and Boxing is the slowest both in terms of
running the method and subsequent access of the iterater. It's a
2-fold difference though. The flatmap solution is fastest,
especially for creating the Iterator.
Also worth noting that Option<&T> is Copy, so the caller can still use the original Option, and it's the same size as &Option<&T>, but requires less indirection, so passing by-value is actually cheaper. Win-win-win.
(Edit: Actually, the performance difference appears to be optimized away. Point stands.)
The Either crate solution works, but inserts a match for every Deref call which means before every next as far as I can see. The flat_map solution does the same thing, as part of the internal calls it makes to the contained Iterator .
All of the following methods can be optimized in such a manner:
any
all
count
find
{,try_}fold
{,try_}for_each
last
{min,max}{,_by,_by_key}
nth
partition
{,r}position
product
sum
unzip
Overriding just fold (or I guess you should override try_fold as of 1.27) will automatically optimize the vast majority of these, which use it in their default implementation.
Of course, @scottmcm is right. nth can't use try_fold.
A small hiccup: try_fold takes &mut self with Self: Sized. So at best we need &mut &mut Self.
Another hiccup: by_ref takes Self: Sized, so you can't use it. (however, you can just mutably borrow self)
After clearing those hiccups, it compiles, but does not forward to the optimized try_fold of the underlying iterator. This is because impl Iterator for &'a mut T does not forward the try_fold method, because it can't, because the impl is for T: ?Sized.
An obvious corollary to this is that by_ref (or taking &mut iter) destroys virtually all of the fold optimizations!