Generic code over iterators


#1

I’m still reading the Rust book and wanted to try writing a generic function over an iterator, something like max(). The book suggested doing it over an int array, I tried to go one further and do it with a generic IntoIterator.

However, I’m stuck, and have no idea how to actually write it.
After simplifying my code from max(), to a silly re-implementation of next(), sort of, it still doesn’t compile.

Doesn’t compile:

fn first1<T : IntoIterator>(col : &T) -> &T::Item {
	col.into_iter().next().as_ref().unwrap()
}

It fails with

error: borrowed value does not live long enough
  |
  |     col.into_iter().next().as_ref().unwrap()
  |     ^^^^^^^^^^^^^^^^^^^^^^ does not live long enough

But the following, which uses a slice, does compile:

fn first2<T>(col : &[T]) -> &T {
	col.into_iter().next().as_ref().unwrap()
}

After looking at the docs, it looks like the difference is that slice’s into_iter() returns an Iterator whose Item associated type is a reference, while the, uh, default(?) Iterator’s Item is not a reference.

Maybe the way to write first1() (if it’s possible at all) is to specify somehow that T’s Item is a reference, so calling next() on it wouldn’t move it? But I have no idea how to specify that.


#2

Right. Saying T: IntoIterator doesn’t necessarily mean that &T is an IntoIterator. So what you want is something like this:

fn first1<T, I>(col: &T) -> &I
where for<'a> &'a T: IntoIterator<Item = &'a I> {
...
}

#3

Thanks, I was able to complete my implementation of a generic max() function.

But I was quite surprised by your answer, it took me a few days to understand all the implications (like I said, I’ve just started learning Rust).

So - this implies that when writing implementations of traits, say some trait called MyTrait, we need to write separate implementations for MyTrait, &MyTrait and &mut MyTrait ? And that’s because they’re considered to be distinct types, right?

But isn’t the type we implement for used just for determining the type of the self parameter in the function implementations?
If so why not just define (and implement) multiple functions under MyTrait (not for &MyTrait and &mut MyTrait), one taking self, the other taking &self, the third taking &mut self ? (I’m probably missing something here).

For example:
Suppose we have the following trait :

trait Doable {
	fn doit(self);
}

and we want to implement it for some struct D:

struct D {
}

And we want to handle both these functions:

fn test_doable1<T>(t : T) where T : Doable {
	t.doit();
}

fn test_doable2<T>(t : &T) where for<'a> &'a T : Doable { // using the syntax from your answer
	t.doit();
}

Given this, we’d have to write:

impl Doable for D {
	fn doit(self)      { println!("Doable::doit"); }
}

impl<'a> Doable for &'a D {
	fn doit(self)      { println!("Doable for &D::doit"); }
}
impl<'a> Doable for &'a mut D {
	fn doit(self)      { println!("Doable for &D::doit"); }
}

Which is the same as what was done for Vec, I suppose - it implements IntoIterator 3 times (as can be seen from its source).

But! If Doable was defined as:

trait Doable {
	fn doit(self);
	fn doit_ref(&self);
	fn doit_ref(&mut self);
}

Then we could only implement Doable once, and users of our interface could choose to call the correct function:

impl Doable for D {
	fn doit(self)      { println!("Doable::doit"); }
	fn doit_ref(&self) { println!("Doable::doit_ref"); }
}
fn test_doable1<T>(t : T) where T : Doable {
	t.doit();
}
fn test_doable2<T>(t : &T) where T : Doable {
	t.doit_ref();
}

This means that when we want to write generic functions which accept a Doable, we only have to call the right function, instead of specifying those complex-looking generic constraints in each function.
Is there a problem with this which I’m missing? If not, why isn’t this method used instead of implementing a trait 3 times?


Now, about that syntax:

fn first1<T, I>(col: &T) -> &I
where for<'a> &'a T: IntoIterator<Item = &'a I> {
...
}
  1. You specified the IntoIterator::Item as a separate type I just so you could connect their lifetimes, right?

  2. The need to connect their lifetimes is because that’s cannot be inferred by the compiler, and the reason is that it cannot know if how I’m ever going to use IntoIterator::Item inside the function, I have to declare it myself, and this declaration is part of the function’s signature, right?

  3. This syntax of the part after the where is what’s called higher order trait bounds (HRTB) ? Or is this something else?

  4. I don’t quite understand why the following version (without HRTB or whatever it’s called) doesn’t compile:

    fn first1_bad<'a, T, I>(col : &'a T) -> &'a I
    where T : IntoIterator<Item = I>
    {
    col.into_iter().next().as_ref().unwrap()
    }
    This fails with

    error: borrowed value does not live long enough
    |
    25 | col.into_iter().next().as_ref().unwrap()
    | ^^^^^^^^^^^^^^^^^^^^^^ does not live long enough

Is this because that 'a lifetime parameter, which I put in the function type list, is only relevant when I is directly used, but here next() returns an Option<I> ?
And if so, is that special syntax you used meant to cover this, that is, to make each reference to I anywhere (in this case inside Option<I>) to have the same lifetime as T ?

Sorry, that came out too long, but there are too many things in your short answer which I felt I had to understand before going on with learning Rust.
Thanks again for your time :slight_smile:


#4

I think @vitalyd’s code was actually a bit too restrictive. It requires that what you pass to first1 be a reference, not only that it yield a reference. This would also work, is more flexible, and I think is less complicated to understand (it doesn’t use an HRTB for example):

fn first1<'a, T: IntoIterator<Item = &'a I>, I>(iter: T) -> &'a I {
    iter.into_iter().next().unwrap()
}

The key insight I think is that &'a Vec<T> is a type all its own, just like Vec<T> is a type. References aren’t just ‘modes’ that types go into, they are types themselves, and so you can implement traits for them. In my impl, when you call first1(&vec), the type of T is &Vec<I>, not Vec<I>.


#5

@withoutboats, I think you’re right and thanks for pointing that out. I probably took @olvy’s initial code a bit too literally, where it was using references. The upside of requiring a reference type is you won’t move “by accident” a type that yields references, but is passed by value. Although that probably isn’t a requirement here, as you hint at.


#6

Thanks, that really helped me!

Since T’s type can also be a reference, we don’t have to specify this, only that IntoIterator::Item (I) is a reference. That’s enough to fail compilation when passing a Vec by value.

What confused me I think was that I expected this to work as in C++, where a parameter T to a generic function will only match passing by value, not by reference. I approached this as I would approach it in C++ and automatically specified passing T by reference, which made it more complicated than necessary.

Incidentally, the error messages weren’t very good. With C++ compilers with template code I usually get detailed errors including the instantiated types. Here I got a laconic “does not live long enough” on a temporary value returned from a function, but it was difficult and time consuming to track down what was its type exactly. I remember that the Rust book advises to try and assign the function’s return value to a temporary variable just ot see what its type is, but that creates other lifetime problems as well.


#7

So consider the following:

let v = vec!["hello", "there"];
first1(v);

As it stands, this doesn’t compile because the first1 signature requires that I: Sized, and str isn’t - this doesn’t compile. Now, going along with the flexibility aspect that @withoutboats mentioned, one could consider amending the signature of first1 to

With this, the above example compiles and v is consumed (moved). Now, maybe it’s intentional to insist that I: Sized and this is then probably not a concern.[quote=“olvy, post:6, topic:10907”]
Here I got a laconic “does not live long enough” on a temporary value returned from a function, but it was difficult and time consuming to track down what was its type exactly. I remember that the Rust book advises to try and assign the function’s return value to a temporary variable just ot see what its type is, but that creates other lifetime problems as well.
[/quote]

The typical trick is to assign to a unit type, e.g.:

let () = // the binding you're curious about

Using your example above:

fn first1_bad<'a, T, I>(col : &'a T) -> &'a I 
	where T : IntoIterator<Item = I>
{
    let () = col.into_iter().next();
    unimplemented!()
	//col.into_iter().next().as_ref().unwrap()
}

yields

error[E0308]: mismatched types
  --> <anon>:10:9
   |
10 |     let () = col.into_iter().next();
   |         ^^ expected enum `std::option::Option`, found ()
   |
   = note: expected type `std::option::Option<I>`
              found type `()`

If you try the same trick with the correct version:

error[E0308]: mismatched types
 --> <anon>:4:9
  |
4 |     let () = iter.into_iter().next();
  |         ^^ expected enum `std::option::Option`, found ()
  |
  = note: expected type `std::option::Option<&I>`
             found type `()`