Help with writing a custom tree iterator

I am trying to write a tree data structure where I can use the following methods:

let tree: ProcTree = get_proc_tree(); // a process tree
let re = args.re;
let iter = tree.filter(|p| re.is_match(p.cmdline())) // include processes that match cmdline
  .with_children() // recursively include children of matched processes
  .filter(|p| should_include(p)) // filter out some more processes
  .filter(|p| is_myself(p)); // filter out the current running process
  // etc.
for proc in iter {
  process_proc(proc);
}

For each of these I would also like to provide mutable versions which can mutate the inner process. Here's my progress so far: Rust Playground. The first compiler error is:

error[E0207]: the lifetime parameter `'a` is not constrained by the impl trait, self type, or predicates
  --> src/lib.rs:88:6
   |
88 | impl<'a, I> Iterator for WithChildrenMut<I>
   |      ^^ unconstrained lifetime parameter

caused by

pub struct WithChildrenMut<I>
{
    inner: I,
    visit: vec::IntoIter<usize>,
}

impl<'a, I> Iterator for WithChildrenMut<I>
where
    I: ProcTreeIterMutPrivate<'a>,
{
    type Item = &'a mut ProcNode;

    fn next(&mut self) -> Option<Self::Item> {
        let Some(idx) = self.next_idx() else {
            return None;
        };
        Some(self.tree_mut().tree.get_mut(idx).unwrap())
    }
}

What I think I'm trying to express is, "this struct as an Iterator returns a &'a mut ProcNode as long as its inner ProcTreeIterMutPrivate<'a> can give a &'a ProcTree." But I don't know how to express it correctly.

You have a sort of lending iterator pattern going on here, where every time you return a new item, you need to invalidate the previously returned items. Every time you do this:

tree.get_mut(idx)

It requires exclusive access to then entire Vec<_>. (Even if Rust had stronger APIs for partial borrowing, there's nothing about a Vec<usize> that the compiler could use to guarantee there is no aliasing &muts going on. So you would need something more complicated, and/or unsafe.)

So you can't implement Iterator with this design. Iterator requires that all items are valid simultaneously.


Incidentally:

impl<'a> ProcTreeIterMutPrivate<'a> for IterMut<'a> {
    fn tree_mut(&'a mut self) -> &'a mut ProcTree {

That's an anti-pattern. Maybe fixable with

impl<'iter: 'a, 'a> ProcTreeIterPrivate<'a> for IterMut<'iter> {

depending on how you move forward. (Or maybe you'd be better off without a specific parameterizing the trait; I didn't play with it enough to have a solid opinion.)

1 Like

My plan was to probably use unsafe here. In my burned out brain, I think it is okay to do, since there shouldn't be aliasing (if somehow there is, I can check it at runtime when the tree is constructed). I'll read your write-up.

I got it mostly working, but there's one error left. I'm almost certain it is exactly an instance of Mutable slice iterator - Learning Rust. Here's the playground code: Rust Playground

After some more reading through your notes, I'm less confident in my understanding of borrowing. I think this statement is pretty pertinent:

Moreover, Rust is practicality oriented, and the abilities of the compiler have developed organically to allow common patterns soundly and ergonomically. Which is to say that the borrow checker has a fractal surface; there's an exception to any mental model of the borrow checker.

One thing I'm uncertain about is whether the idea for my API is safe. For instance:

let tree: ProcTree = get_proc_tree(); // a process tree
let re = args.re;
let mut iter = tree.filter(|p| re.is_match(p.cmdline())) // include processes that match cmdline
  .with_children(); // recursively include children of matched processes
let p1: &mut Process = iter.next().unwrap();
let p2: &mut Process = iter.next().unwrap(); // get two mutable references out of the
// iterator, potentially children of processes which are still in the iterator.
let iter2 = iter.with_children(); // this should recursively iterate through the children of all
// the processes left in the iterator... meaning potentially the processes p1 and p2
let p3: &mut Process = iter.next().unwrap(); // this may be the same Process as p1
p1.mutate(); // Should fail to compile here and only here?

If I understand correctly, I need it to be such that I can create multiple exclusive references into each unique location in the tree, as in Rust Playground. But somehow, I should not then be able to borrow iter again for use in with_children() until I've dropped those references. Except this usage shows that I must be able to borrow iter again... for use in next(). So maybe this API is flawed.

As an aside, I've read your notes, this blog post (I had never thought about liveness scope and referential scope being separate things before), and some parts of the Nomicon, and I'm still barely able to understand what's going on. Do any of the books out there like "Rust for Rustaceans" or the O'Reilly book explain the borrow checker in-depth and with continuity? Or maybe I just need to re-read The Book carefully?

I've been in a similar spot. With a linear structure, it's possible to slice the vector to keep the borrow checker (and Miri) happy, but for a tree, that'd mean partitioning the vector containing all the nodes so that it could always be split to get the remaining children for a type of search, like pre- or post-order depth-first. I'm not even sure it's possible; perhaps with a recursive search function, but I'd rather avoid that unless the tree is guaranteed to be shallow.

I'm not sure what visit order you're using (is that pre-order?), but I see you also store the children indices in each node. Since I used post-order depth-first search, I knew that I always visited parents after children, so a mutable iterator was sound, and I used one line of unsafe code to get the node reference without making the borrow checker and Miri think I was borrowing the whole vector.

It looks like you also want to let the user explore the children from within the loop? That's more complicated, especially with mutable iterators, since the user could store the mutable reference to a children, say 3.2.4, get to its parent 3.2 after a few iterations, then ask for an immutable reference to the same children 3.2.4 while the mutable reference still lives. Again, it depends on what you allow and the visit order: a pre-order would get a similar problem, just the other way round—I think that's your example above. It can indeed get quite hairy. I managed that case dynamically by using UnsafeCell for interior mutability and forbidding to hand any immutable reference (e.g. get the node's children) while a mutable reference was still alive and the other way round, but that's not something I'd recommend unless you're 100% sure to handle all the cases safely.

Make sure to run thorough tests with Miri!

Regarding books, O'Reilly's Programming Rust explains lifetimes and references quite well, so does Effective Rust, but they're general tutorials and don't get into such deep analysis. You could have a look at Learning Rust With Entirely Too Many Linked Lists, even if it's more pointer-oriented.

Of course, one way to make it easier is to include the children nodes in each node, but it's not great for performances:

pub struct ProcNode {
    proc: Process,
    children: Vec<ProcNode>,
}

ProcTree isn't really necessary any more, except if you want an official "root", or if you store other data related to the whole tree.

I think pre-order - where the parent is visited first - would solve my problem, right? My idea for with_children() was that when you call it on an iterator, it will return another iterator through all remaining nodes of the original iterator and their children. With pre-order, if I have popped node A from the iterator, then I have already popped its parent and all of its grandparents, so a subsequent call to with_children() cannot include node A again.

Yeah, it pretty much is.

I'm not sure I understand your complete vision; your playground has iterators that return trees, but then WithChildrenMut::from just uses a single tree and discards the rest of the iterator (it could have just taken a &'a mut ProcTree. Back to that in a moment...

// get two mutable references out of the
// iterator, potentially children of processes which are still in the iterator.
let p1: &mut Process = iter.next().unwrap();
let p2: &mut Process = iter.next().unwrap();

This part is problematic, if you want to actually hold on to a &mut ThingThatOwnsProcess. The only way to hold on to that while handing out children is to reborrow.[1] Reborrowing means you need a lending iterator, not a std::Iterator. However you also want multiple processes at a time, so now we'd be looking at something like get_disjoint_mut... but even that can only get you so far:

// this should recursively iterate through the children of all
// the processes left in the iterator... meaning potentially the processes p1 and p2
let iter2 = iter.with_children(); 
// this may be the same Process as p1
let p3: &mut Process = iter.next().unwrap(); 

It won't really help with these, because...

// Should fail to compile here and only here?
p1.mutate();

...borrow checking needs to ensure that active &mut _ are exclusive references at compile time. (And it really does mean exclusive: an aliased active &mut _ is UB even if no actual mutation happens through it. Think of &mut _ as "mutually exclusive".)

And yeah, hidden behind that "active" qualifier is all that reborrowing stuff.


OK, back to your playground. If you only need the children of one tree, we can make the iterator portion of your playground work. At least if I'm understanding correctly.

I didn't change your recursive usize collector. But instead of holding onto the &mut ProcTree, I did this:

pub struct WithChildrenMut<'a> {
    // tree: &'a mut ProcTree,
    nodes: Vec<Option<&'a mut ProcNode>>,
    visit: hash_set::IntoIter<usize>
}

// Then in WithChildrenMut::from
        let tree = iter.into_tree_mut();
        let nodes: Vec<Option<&mut ProcNode>>
            = tree.tree.iter_mut().map(Some).collect();
            
        WithChildrenMut {
            nodes,
            visit: matched.into_iter(),
        }

Now you can implement Iterator like so:

    fn next<'b>(self: &'b mut WithChildrenMut<'a>) -> Option<&'a mut ProcNode> {
        let idx = self.next_idx()?;
        self.nodes[idx].take()
    }

(It could also work with multiple &mut ProcTree, but I didn't attempt such. If nothing else you could use a flat_map approach.)

The difference from before and the intermediate discussion above is that we're not trying to hold on to the parent &'a mut ProcTree while we hand out its children. Instead we've effectively converted it into a Vec of disjoint &'a mut ProcNode. (There are other, potentially better choices here; I arrived at this point by making minimal changes.)

However, giving up on holding the &mut ProcTree may clash with your larger design. I had to toss out most of the implementations of your other traits -- methods which more or less amount to "I am or am holding onto a &[mut] ProcTree".

impl<'a> ProcTreeIterPrivate<'a> for WithChildrenMut<'a> {
    fn tree(&self) -> &ProcTree {
        unimplemented!() // self.tree
    }

    fn into_tree(self) -> &'a ProcTree {
        unimplemented!() // self.tree
    }

    fn next_idx(&mut self) -> Option<usize> {
        self.visit.next()
    }
}


impl<'a> ProcTreeIterMutPrivate<'a> for WithChildrenMut<'a> {
    fn tree_mut(&mut self) -> &mut ProcTree {
        unimplemented!() // self.tree
    }
    fn into_tree_mut(self) -> &'a mut ProcTree {
        unimplemented!() // self.tree
    }
}

If you had some ProcTree identifier that could be used to recreate a &mut ProcTree later on, perhaps that's an area to explore. But recreating the parent &mut ProcTree would conflict with any of its &mut ProcNodes still being in use.

The standard book, sadly, will not help on this topic (and may hurt). The Brown version of chapter 4 gives a model closer to reality. It has its own problems,[2] but their permission model is much more accurate than the standard book.


  1. Or remove the children I guess, but I take it that's not your use case. ↩︎

  2. they miss some kinds of undefined behavior so some of their quiz answers are wrong; just ignore all their (dangerous, incorrect) implications that if you avoid dangling pointers and UAFs you definitely avoided UB ↩︎

1 Like

The problem is due to your iterator returning a mutable reference to ProcNode, which includes a field children: Vec<usize>. If you first visit parent node A, the user can store it, then proceed to the next iteration, which gives them child node A.a. The user can mess with A's children field and remove the index of A.a, either before or after the iteration to that child, which may or may not modify the outcome depending how your iterator is preparing the next iterations; for example, if it stores A's children indices before or after that point.

One idea would be to return only the payload, Process. I'm not sure the user can do much with children anyway.

One side note: if I'm not mistaken, you're iterating over a HashSet, though I'm not sure why. Bear in mind that the order is not guaranteed, and you may have different outcomes depending on the level of optimization, or even from one run to the next. Make sure to document that because it could catch the user unawares, especially in unit tests where the result is typically compared to a constant expected list of values.

1 Like

This was my intent, that users wouldn't be able to access the children field in any way except by getting children indirectly through with_children() on an iterator.

My idea was to compute the nodes I would visit ahead of time, and originally, I didn't care about iteration order, just that they were all included without duplication.

1 Like