Upcast to impl Iterator

I am would like to write a method which returns an Iterator from a
HashSet that I am instantiating lazily. My idea was to return
an empty iterator if the HashSet has not been instantiated yet.

But I can't get this to work. Consider:

use std::collections::HashSet;

fn get_iter<'a>(hs:&'a Option<&'a HashSet<i64>>)
            -> impl Iterator<Item=&'a i64> {
    hs.map(|hs| hs.iter()).unwrap_or_else(|| ::std::iter::empty())
    //hs.map(|hs|hs.iter()).unwrap()
    //::std::iter::empty()
}

fn main(){
    let hs:HashSet<i64> = HashSet::new();
    let s = Some(&hs);
    get_iter(&None);
    get_iter(&s);
}

This should work but doesn't, failing with this error:

  = note: expected type `std::collections::hash_set::Iter<'_, i64>`
             found type `std::iter::Empty<_>`

I get the same thing using if let. The problem is, I guess, clear
enough. unwrap_or_else is generic but must return the same type
whether from unwrap as or_else. In this case, we have two
different concrete types, although they same trait type.

This seems like the canonical use of empty. Is there any way to
upcast the types of the two iterators?

The main problem here is that, even when using impl Trait, you have to return one concrete type.

In this specific case, I think that the best solution is just returning an Option<impl Iterator<Item=&'a i64>>. In this way you have to check the availability of the iterator on the caller, but using Option methods like then and and_then it is pretty easy.

You could use a custom Enum, but if you think about it, it is exactly what you would have done with Option.
Obviously, you could also use a Box<dyn Iterator>, but it has a big overhead due to allocation, and I discourage you doing that for this specific case.

There's an easy fix to your particular problem, because Option has an iter method of its own. Just get an iterator over the Option and use flat_map to flatten out the second "layer".

fn get_iter<'a>(hs:&'a Option<&'a HashSet<i64>>)
            -> impl Iterator<Item=&'a i64> {
    hs.iter().flat_map(|hs| hs.iter())
}

This is kind of a special case of writing a custom type and implementing Iterator for it. It just happens that in this case, the "custom type" is already written for you.

4 Likes

:heart:

I always forget about Option::iter and impl IntoIterator for Option.

In your particular case, I'd go with @trentj's solution, but in general case, when you want to return multiple different types with impl Trait, without using dyn Trait or rolling down your own enum, the either crate is perfect (playground):

fn get_iter<'a>(hs:&'a Option<&'a HashSet<i64>>)
            -> impl Iterator<Item=&'a i64> {
    hs.map(|hs| Either::Right(hs.iter()))
        .unwrap_or_else(|| Either::Left(::std::iter::empty()))
}

Thank you all. I will go with @trentj's solution in this case.

@dodomorandi I think returning Option is wrong in this case. The normal behaviour for an empty data structure is to return an empty Iterator ( and "0" for length for instance). Returning an Option uncovers my lazy instantiation which in my case is an implementation detail.

Still, it seems a weakness of the type system to me. I should be able to do this. The solutions given all require some allocation or conditional logic, and I can't see any underlying reason for this other than to stop the type checker complaining.

Without dynamic dispatch, how would you implement that? Rust needs to know what concrete type is returned by a function.

The most common suggestion is for the compiler to auto-derive an enum with a variant for all possible types, which is what Either does for this case of two possibilities. But of course it still requires a match inside the auto-derived next method.

Because of the fact that Option (the one in the function argument) impl IntoIterator, you are right. As I said before, it was my fault of forgetting this detail :wink:

As already said by @birkenfeld, you want something that cannot be done.

Maybe I am wrong, but it looks like you are comparing this situations to something in other languages like Python. Here you could write something like

def get_iter(d):
    if d:
        return iter(d)
    else:
        return iter({}) 

However, what you cannot see, it is that d and the two return values are all PyObjects allocated on the heap. This allows all the possible dynamic dispatch you want.

You always need a concrete (in Rust, Sized) type to be returned, no matter what you want to do or the language you are using. And the reason is simple: programs must use precise calling conventions.

@birkenfeld If my understanding is correct, then yes, we use dynamic dispatch. The function get_iter returns impl Iterator, so any calls on the return value will require a virtual method call.

@PhilLord that is not what's happening. -> impl Iterator is just the compiler allowing you to omit the actual type, especially in cases where you can't name it, and restricting you only use its Iterator interface.

Dynamic dispatch (with which what you'd like to do already works) is (for example) -> Box<dyn Iterator>.

3 Likes

Ok, maybe the misunderstanding is here: impl Iterator is not related to virtual methods. This is what dyn Trait is for.

With impl Iterator you are returning a concrete type that implements Iterator. Calling the function using different types leads to multiple monomorphizations and multiple compiled functions.

EDIT: @birkenfeld is a faster typer :grin:

2 Likes

Okay, well, that sort of makes sense. But given my function returns two different pieces of functionality (i.e. an Iterator over a set, or an Empty iterator), at some point, there must be a virtual call made? Or does rust the monomorphisation happen for return types as well, where a function can return more than one type? I guess it must, otherwise, the Box solution would be as efficient as we could get.

So, I checked the source. The Either crate solution works, but inserts a match for every Deref call which means before every next as far as I can see. The flat_map solution does the same thing, as part of the internal calls it makes to the contained Iterator. Returning Boxed versions has some overhead as well clearly. But all of the solutions imply some runtime cost. Without testing, I am unclear in the extreme which is most efficient. And, the Box solution seems the clearest (@trentj your solution is very neat, but i read it several times before I worked it out).

So, I am struggling a bit here with the right solution.

If you want to do something like

struct Foo;
struct Bar;

fn get_foo_bar(want_foo: bool) -> ??? {
    if want_foo {
        Foo
    } else {
        Bar 
    }
}

you have different options, and they are almost the one that have been written in the previous posts:

Enum

struct Foo;
struct Bar;

enum FooBar {
    Foo(Foo),
    Bar(Bar),
}

fn get_foo_bar(want_foo: bool) -> FooBar {
    if want_foo {
        FooBar::Foo(Foo)
    } else {
        FooBar::Bar(Bar)
    }
}

This have a very low overhead, and it is what you get whenever you use Option or Result. It does not allocate on the heap.

Trait

Here a not working example:

struct Foo;
struct Bar;

trait FooBar {}

impl FooBar for Foo {}
impl FooBar for Bar {}

fn get_foo_bar(want_foo: bool) -> impl FooBar {
    if want_foo {
        Foo
    } else {
        Bar
    }
}

The reason it is not working is because we are trying to return different concrete types. Even if the two structs are zero-sized, it is not possible to return one or the other (without an Enum).

What can be done is using virtual dispatching:

struct Foo;
struct Bar;

trait FooBar {}

impl FooBar for Foo {}
impl FooBar for Bar {}

fn get_foo_bar(want_foo: bool) -> Box<dyn FooBar> {
    if want_foo {
        Box::new(Foo)
    } else {
        Box::new(Bar)
    }
}

In this way you can return any structure that Implements FooBar, but you have to use the heap to do so.

Any

Sometimes you have only one last resort: Any

use std::any::Any;

struct Foo;
struct Bar;

fn get_foo_bar(want_foo: bool) -> Box<Any> {
    if want_foo {
        Box::new(Foo)
    } else {
        Box::new(Bar)
    }
}

You can return any possible structure inside Any, but you must check manually the type of the underlying object, and/or try to downcast to the appropriate type.

So?

When you use Any or Box<dyn Trait> (in Rust 2015 you can also write Box<Trait>, but it is the same) you are using virtual dispatching, otherwise not. No iterator in the standard lib use dynamic dispatching. In the solution posted by @trentj the return value is only quite complex, but is could be written manually (except for the FnMut type) and it is a concrete, stack allocated type.
You could have written something like

use std::hash::Hash;

fn get_iter<'a, RefIterable, T: 'a>(hs: &'a Option<&'a RefIterable>) -> impl Iterator<Item = &'a T>
where
    for<'t> &'t RefIterable: IntoIterator<Item = &'t T>,
    T: Eq + Hash,
{
    hs.iter().flat_map(|hs| hs.into_iter())
}

In this case you have a generic function, and the output type depends on the input type. However, even in this case in which the concrete type of impl Iterator is complex as hell, it is deduced at compile time and with zero overhead.

I hope I have been able to help you a bit. :smile:

2 Likes

When in doubt, benchmark it. However, you will be surprised about the ability of the optimizer to remove almost everything. I think that you cannot a better solution than the one suggested by @trentj.

Try to look at the generated assembly using the rust playground or the compiler explorer.

1 Like

There doesn't have to be a right solution. Box<dyn Iterator> trades the cost of a pointer indirection for the cost of a branch (or maybe 2), and they're not necessarily directly comparable.

That said, I expect it'll be hard for Box<dyn Iterator> to beat the statically dispatched version even with the extra branch. Boxing things taxes the allocator, it is less cache-friendly, and it may compile to more instructions. On the other side, the cost of that match may be a few cycles or (with branch prediction) nothing at all. It's almost certainly swamped by the cost of incrementing the iterator.

But I absolutely agree with benchmarking it. If a few cycles matter, you really can't afford not to.

Aside: Box<dyn Iterator> is, itself, a type that implements Iterator. So the function that returns impl Iterator can have its body rewritten to use dynamic dispatch without changing its signature. However, a function that returns Box<dyn Iterator> can never be rewritten to use static dispatch.

1 Like

Thank you all for your careful explanations. As you can see I am still learning rust; and I still find a significant amount of "magic" happening probably caused by my lack of understanding of the semantics. This discussion has helped my understanding further.

1 Like

That's a fundamental property of trying to use different types. There has to be something that figures out which to use.

Perhaps what you really want is

static EMPTY_HASHSET: HashSet<i64> = HashSet::new();
fn get_iter<'a>(hs:&'a Option<&'a HashSet<i64>>)
            -> impl Iterator<Item=&'a i64> {
    hs.map(|hs| hs.iter()).unwrap_or_else(|| EMPTY_HASHSET.iter())
}

But sadly that doesn't work today because HashSet::new() isn't const (nor is `HashMap::new(), which it calls internally). As a proof-of-concept, though, it does work today on nightly with Vec: Rust Playground

1 Like

@scottmcm Funnily enough, this is the solution that I came to in the end. It feels false, because I have to keep an entirely pointless HashSet around attached to the a struct I am using (in the end example code I gave, get_iter is a function, but in the full code it's a method attached to a struct).

I tried @trentj solution, but it fails in my use case because the Iterator it returns has the same lifetime as the Option; that solves the problem as I gave it, but in my case, the Option is retrieved from a HashMap. When the Option goes out of scope, so does the Iterator, and bang, borrowck hell.

I remain of the believe that this is a weakness in Rust. Since it is possible to do what I want with an Enum, it can be done safely and without dynamic dispatch. It would be nice if it were easier and without requiring programming intervention. Failing a general solution in the type system, something specific for Iterator would be nice -- for example a std::iter::or_empty function to supplement the existing empty.

I'm will have a go at benching @dodomorandi solutions anyway, because I am interested.

You are right, but in this case you can just change the function signature and make a small adjustment:

use std::collections::HashSet;

fn get_iter<'a>(hs: Option<&'a HashSet<i64>>)
            -> impl Iterator<Item=&'a i64> {
    hs.into_iter().flat_map(|hs| hs.iter())
}

fn main() {
    let mut hash = HashSet::new(); 
    hash.insert(3);
    
    assert_eq!(get_iter(Some(&hash)).next(), Some(&3));
    assert_eq!(get_iter(None).next(), None);
}

As you can see, Option is now consumed by the function, and the functions performs a into_iter instead of iter.