Implementing an Iterator that returns a reference of an object stored in another object

Hello,

My ultimate goal is to create a function that walks over a directory recursively and finds files that will match some predicate function. The rules for filtering will be supplied by the user.

In order to write this, first I need a method of testing my predicate matching engine. I've figured the best way to test the predicate matching engine is to create a dummy directory walking function (a mock: TestWalker), then feed a list of predefined paths to the predicate matching function, run the engine, and check if the engine has returned the proper list of paths. When the predicate matching function will be working correctly (i.e. it will pass all tests), I will create a real directory walking function (RealWalker), and I'll just plug-in the already-tested predicate matching function to the real directory walking function. At the same time I want my tests to still exist in the code, in the case I'll want to modify the predicate matching function in the future.

So I've figured that I need an object TestWalker, which will contain a vector of PathBuf's -- this will be my "virtual" filesystem, that will be set up on the beginning of each test. I want this TestWalker to produce an object that is an Iterator, so I can use for on this iterator object. This Iterator object (a WalkerIterator in my example) should be only a thin proxy for the actual iterating logic; The TestWalker and RealWalker objects should handle the actual iteration logic, as the mock object will have a slightly different iteration logic than the real object. The Iterator object should encapsulate the underlying object so that the predicate matching function will have no way of knowing which object it interacts with.

I've started to implement this like in the following example:

use std::path::PathBuf;

// This is a common interface that both TestWalker 
// and RealWalker will have to implement.
trait ListFiles {
    fn reset(&mut self);
    fn next(&mut self) -> Option<&PathBuf>;
}

// This is my iterator logic object. The predicate 
// matching function should use it instead of using 
// TestWalker or RealWalker directly.
struct WalkerIterator<'a> {
    inner: &'a mut dyn ListFiles,
}

// Main object of the mock function, used in tests.
struct TestWalker {
    files: Vec<PathBuf>,
    cur_idx: usize,
}

impl TestWalker {
    fn iter(&mut self) -> WalkerIterator {
        WalkerIterator { inner: self }
    }
}

// Iteration logic for the mock object.
impl ListFiles for TestWalker {
    fn reset(&mut self) {
        self.cur_idx = 0;
    }
    
    fn next(&mut self) -> Option<&PathBuf> {
        if self.cur_idx >= self.files.len() {
            None
        } else {
            let r = &self.files[self.cur_idx];
            self.cur_idx += 1;
            Some(r)
        }
    }
}

However, I have no idea how to write the actual Iterator. If I'm doing it like this, Rust errors out with "error[E0495]: cannot infer an appropriate lifetime for autoref due to conflicting requirements" error.

impl <'a> Iterator for WalkerIterator<'a> {
    type Item = &'a PathBuf;
    
    fn next(&mut self) -> Option<&'a PathBuf> {
        self.inner.next()
    }
}

Please note that I would like to return a reference instead of a cloned PathBuf.

If I understand this correctly, the lifetimes of the WalkerIterator object above bind the lifetime of the return value of next() to the lifetime of the WalkerIterator itself. But I probably need to bind the return value's lifetime to the same lifetime as the instance of TestWalker object has, since this is the source of PathBuf's that I would like to use when returning a reference. Is it possible? Or do I understand lifetimes in a wrong way? :wink:

Is it possible to implement the Iterator object in this scenario? Is there a better way of doing what I want to do? I'm trying to make the following code work:

fn main() {
    let file_list = vec!["file1", "file2", "file3"].iter().map(PathBuf::from).collect();
    let mut walker = TestWalker { files: file_list, cur_idx: 0 };
    
    // I want this to work correctly:
    for e in walker.iter() {
        // `e` should be Option<&PathBuf>, with the same lifetime as 'walker'
        println!("{:?}", e);
    }
}

Playground link -- https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=34f44d8a65ba0bae93db5a79085d5f3c

Your lifetime problem originates here:

In particular, the compiler will ensure that there are no further accesses to self while the returned &PathBuf exists. That makes your Iterator implementation break because it tries to give up exclusive access at the end of next.

You can, indeed, not create an iterator by wrapping the ListFiles trait without cloning (and having the iterator return owned paths bufs) or changing the trait (and probably also the type that implements it). I have a question about this:

Does this even apply for the RealWalker? Would that one retain owned path bufs that you can borrow from or would it read the paths from the file system creating new PathBufs for each entry anyways? In the latter case, the only part where using references saves extra cloning would be the mockup implementation and you should IMO just have the iterator return owned PathBufs.

Yes, that's true :), but with owned PathBuf there is no issue. I wanted to increase the complexity a bit so I'll know what to do in some hypothetical future case where I will want to return some big and expensive to clone objects instead of small PathBuf's.

I was thinking about the same situation as you wrote and I've figured that RealWalker will just cache the PathBuf, and its iterator will return a reference to the cache.

In a release code I would probably do as you suggest, but the code I'm working on is more a learning tool for Rust than anything else, and I would really like to have WalkerIterator return a reference, mostly because I don't know how to do it :slight_smile:

Can I resolve this somehow? Should I focus on adding additional lifetime markers?

I admit I don't fully understand the problem, so any additional explanations would be very helpful!

If you change &mut self to &self, then the issue of exclusive access goes away. The closest I could make work was this:

trait ListFiles {
    type IterState;
    fn reset(&self) -> Self::IterState;
    fn next<'st>(&self, state: &'st mut Self::IterState) -> Option<&'_ PathBuf>;
}

struct WalkerIterator<'a, Inner:ListFiles> {
    inner: &'a Inner,
    state: Inner::IterState
}

impl<'a, Inner:ListFiles> WalkerIterator<'a,Inner> {
    fn new(inner: &'a Inner)->Self {
        WalkerIterator { inner, state: inner.reset() }
    }
}

impl <'a,Inner:ListFiles> Iterator for WalkerIterator<'a,Inner> {
    type Item = &'a PathBuf;
    
    fn next(&mut self) -> Option<&'a PathBuf> {
        self.inner.next(&mut self.state)
    }
}

(Playground)

2 Likes

Thanks.

I need to study this, but it seems like it works like I wanted it to.

I got to the point that changing &mut self to &self in ListFiles allowed me to compile most of the code, but then I couldn't properly write the implementation of ListFiles::next inside TestWalker, because I needed it to mutate self's state. But I haven't thought about splitting the logic and the state, and making the state mutable, while leaving the logic as non-mutable.

Very helpful. Many thanks!