Newbie: Building a Parent iterator upon Vec<Child>, &Vec<Child> or Vec<&Child>?

Hi!

When trying to port some existing code, and at the same time learn more about the Rust idioms, I've run into the question on how to best setup a Parent instance where i can iterate over its children, giving out a &Child in each iteration.

I've broken this down to three questions: the two first related to why my Vec<Child> approach doesnt work, and if it somehow can be made to. The third question is about advantages/disadvantages when using &Vec<Child> or Vec<&Child>.

Thanks in advance!

First try: children as Vec<Child>

My first naive approach was storing the children as Vec<Child>, but that doesn't seem to work - the compiler is complaining about missing explicit lifetime for the iterator type Item = &Child:

#[derive(Debug)]
pub struct Parent1 {
    children: Vec<Child>,
    idx: usize,
}

impl Parent1 {
    fn new(children: Vec<Child>) -> Self {
        Self { children, idx: 0 }
    }
}

impl Iterator for Parent1 {
    type Item = &Child; // <---- explicit lifetime needed here
    fn next(&mut self) -> Option<Self::Item> {
        if self.idx < self.children.len() {
            let item = &self.children[self.idx];
            self.idx += 1;
            return Some(item);
        }
        None
    }
}

Question 1: Why is that so? Couldn't the compiler figure out that the &child references should live as long as the Parent instance and it's children vec?
Question 2: Is there a way to actually notate the lifetime in this case? And so making the Parent have full ownership of the vec, and the vec of it's children..?

Trying annotate the lifetime like the following...

pub struct Parent1<'a> { // <----- unused parameter
    children: Vec<Child>,
    idx: usize,
}

...(as in the examples below) doesn't seem to work here as we have no field consuming the 'a in the struct. I also get the help message consider removing 'a, referring to it in a field, or using a marker such as PhantomData....

Second try: children as &Vec<Child> or Vec<&Child>

When using either of &Vec<Child> or Vec<&Child>, the lifetime annotation works as expected, giving the expected result:

let vec:&Vec<Child> = &vec![Child {}, Child {}];
for child:&Child in Parent2::new(vec) {
     println!("- &child:{:?}", child);
}

Here's the implementation for &Vec<Child>:

#[derive(Debug)]
pub struct Parent2<'a> {
    children: &'a Vec<Child>,
    idx: usize,
}

impl<'a> Parent2<'a> {
    fn new(children: &'a Vec<Child>) -> Self {
        Self { children, idx: 0 }
    }
}

impl<'a> Iterator for Parent2<'a> {
    type Item = &'a Child;
    fn next(&mut self) -> Option<Self::Item> {
        if self.idx < self.children.len() {
            let item = &self.children[self.idx];
            self.idx += 1;
            return Some(&item);
        }
        None
    }
}

(The Vec<&Child> is identical except for the swapped reference annotations.)

Question 3: As both variants seem to work fine, are there any obvious reasons why to choose one over the other (&Vec<Child> or Vec<&Child>)?

Thanks!

You can't use Rust references, which are temporary scope-bound loans, for any kind of parent-child relationships. There's no amount of syntax and lifetime annotations that could make it work in any useful way.

This is for two reasons:

  • References are not simply pointers, they're not just for "referencing" an object. They have quite strict semantics that put restrictions on scopes where they can be used, and "lock" the borrowed objects for duration of the loan.

  • Borrow checking doesn't support cyclic or self-referential types

You will have to use Arc (or Rc in single-threaded code) and Weak instead, or redesign your code not to use parent-child relationships and e.g. use a flatter data structure, use integer indexes into an object pool, or pass parent object as an argument to functions that use it.

The same problem appears when you try to make a doubly-linked list, so this may be helpful:

Assuming Parent is intended to be a persistent data structure and not just an iterator, the general pattern in Rust is to define IntoIterator impls on references to the struct that provides the data. This gives you the lifetime you need to correctly define the item type.

Since you're using Vec as the backing storage, you don't actually need to implement Iterator at all, you can just use the iterator Vec provides.

Playground

#[derive(Debug)]
pub struct Child(usize);

#[derive(Debug)]
pub struct Parent {
    children: Vec<Child>,
}

impl<'a> IntoIterator for &'a Parent {
    type Item = &'a Child;

    type IntoIter = std::slice::Iter<'a, Child>;

    fn into_iter(self) -> Self::IntoIter {
        self.children.iter()
    }
}

impl<'a> IntoIterator for &'a mut Parent {
    type Item = &'a mut Child;

    type IntoIter = std::slice::IterMut<'a, Child>;

    fn into_iter(self) -> Self::IntoIter {
        self.children.iter_mut()
    }
}

impl Parent {
    pub fn iter(&self) -> std::slice::Iter<'_, Child> {
        self.into_iter()
    }

    pub fn iter_mut(&mut self) -> std::slice::IterMut<'_, Child> {
        self.into_iter()
    }
}
4 Likes

Here you seem to be thinking that &Child is a type. Sadly, it is not. It may look like one, but only because in "ordinary", function-local code, the lifetime parameter can be left out and it will be provided by the compiler implicitly, mostly based on the lifetime annotations in the function signature and the scopes of local variables.

However, &Child is not in itself a proper, complete type. A reference always has an associated lifetime. So, only &'a Child is a real type for some lifetime 'a. When you are implementing an associated type requirement, you have to supply a real, actual type, not just a type constructor that lacks the lifetime annotation. This is because there is no surrounding code that the compiler could use to infer the missing lifetime.

Rust typechecks functions one-by-one, separately, and in general the compositional nature of the type system means that the designers of the language intentionally try very hard not to introduce distal dependencies between different parts of the code. You might say that "oh but Item is obviously just the item type, so it must live as long as the backing vector". The problem with that is that this is not at all "obvious" to the compiler for many reasons. (Two of them are: 1. the compiler is not a human and doesn't understand the intent of the code, it can only apply mechanistic checks and transformations to the code, and 2. the reasoning is plainly incorrect, since any other "subtrait" or other constructs could use the Item associated type as well, in which case it's no longer merely a return type of one single method.)

It depends on what you want to achieve. &Vec<Child> means that you already have a vector of owned children, and you give out a reference to the whole vector. Vec<&Child> means that you build a vector over some temporary references to each child, separately. If you built your tree structure using owned Children (probably the right choice here), then you will likely find it easier to just obtain a reference to the whole thing, hence option #1. But it really depends.

2 Likes

Largely a reiteration (see what I did there) of the other answers, but...

If references always lasted as long as the owning structure, you wouldn't be able to make much use of the owning structure -- you couldn't move it or mutate it anymore.

Moreover, the way the Iterator trait is designed, the items it returns must be able to outlast the iterator. The typical example of this is collect, which consumes the iterator but leaves you with all the items.

(There's another pattern called "lending iterators", but that's not yet part of the standard library.)

To emphasize a specific part of some other answers, iterators over collections are almost always distinct data structures over the collections themselves. That way you don't have to carry iterator state in your collection, you can have more than one shared iterator, your iterators can be reference based, you avoid a lot of iterator invalidation problems...

I do find making iterators (instead of reusing std's iterators) to be a good exercise; many people seem intimidated by it, but I'm not sure why. For a shared / borrowing iterator like you seem to be asking about, a struct containing &Vec<Child> would be one choice for making a lazy iterator. You can't go from &Parent (which contains Vec<Child>) to Vec<&Child> without eagerly allocating a collecting all the &Child in a new Vec.

4 Likes

Thank you guys for your elaborate answers!
Lots of food for thought...

I guess that much of the implications you all address - how one lifetime aspect affects another etc - won't become clear to me untiil I start to actually consuming my own api layout... :slight_smile:

@kornel: The Many linked lists introduction is a great source, thanks!

@semicoleon: Thanks. Will go for the IntoIterator approach!