Should iterators borrow or own state?

I'm writing my first iterator for a custom struct which is essentially a basic tree data structure representing files on a filesystem. The exact structure isn't super relevant to my questions, but here's a Playground link for reference:

Rust Playground (rust-lang.org)

use std::path::PathBuf;


fn main() {
    // Example hierarchy:
    //   /a/file1.txt
    //   /a/b/file2.txt
    //   /a/b/file3.txt
    let tree = DirectoryFileTree {
        node: DirectoryFiles {
            directory: PathBuf::from("/a"),
            files: vec!["file1.txt".to_string()],
        },
        children: vec![
            DirectoryFileTree {
                node: DirectoryFiles {
                    directory: PathBuf::from("/a/b"),
                    files: vec![
                        "file2.txt".to_string(),
                        "file3.txt".to_string(),
                    ],
                },
                children: vec![],
            }
        ],
    };
    println!("{:#?}", tree);
}


#[derive(Clone, Debug)]
pub struct DirectoryFiles {
    pub directory: PathBuf,
    pub files: Vec<String>,
}

#[derive(Clone, Debug)]
pub struct DirectoryFileTree {
    pub node: DirectoryFiles,
    pub children: Vec<DirectoryFileTree>,
}

I get that I'm supposed to implement the std::iter::Iterator trait, but there are some considerations which don't seem to be covered by that page or many other guides:

  1. I believe the standard library collection iterators yield immutable references to the underlying collection data. Is this conventional behavior for custom iterators, as well?
  2. Should my iterator "snapshot" the underlying data by pre-populating a queue of values, or should it lazily compute values? For the trivial counting example in the docs, it obviously doesn't make sense to pre-populate values, but for my tree structure, I wonder about the implications of either approach. Realistically, for my basic use case, any references will be short-lived and there won't be concurrent access to the underlying data, but is there any best practice for creating iterators on "hot" data in concurrent programs?
  3. Does it make sense to implement the Iterator trait on the DirectoryFileTree type itself, or a separate DirectoryFileTreeIterator type? One issue I thought of was that implementing on the tree type directly requires internal state per iterator, so it doesn't make sense for having multiple iterators. But then if you make a separate iterator type, it gets back to my previous questions about how to reference the underlying data.

I guess I'm just asking what the common approach to iterators is. I'm kind of an old-school C programmer, so iterators in modern languages are a bit opaque to me.


Update: I certainly don't know all the best practices for iterators yet as some things appear to be dependent on use case, especially those with concurrent writes. That being said, I wanted to post a working solution in case it helps anyone else new to iterators:

Rust Playground (rust-lang.org)

use std::{
    ffi::OsString,
    path::PathBuf,
};


fn main() {
    let node1 = FileTreeNode {
        directory: PathBuf::from("/a/b"),
        files: vec![
            OsString::from("file3.txt"),
            OsString::from("file4.txt"),
        ],
        children: vec![],
    };

    let node2 = FileTreeNode {
        directory: PathBuf::from("/a/c"),
        files: vec![
            OsString::from("file5.txt"),
            OsString::from("file6.txt"),
        ],
        children: vec![],
    };

    let node3 = FileTreeNode {
        directory: PathBuf::from("/a"),
        files: vec![
            OsString::from("file1.txt"),
            OsString::from("file2.txt"),
        ],
        children: vec![
            node1,
            node2,
        ],
    };

    node3
        .iter()
        .for_each(|file| println!("{}", file.to_string_lossy()));
}


pub struct FileTreeNode {
    pub directory: PathBuf,
    pub files: Vec<OsString>,
    pub children: Vec<FileTreeNode>,
}

impl FileTreeNode {
    pub fn iter(&self) -> FileTreeNodeIterator {
        FileTreeNodeIterator {
            node: &self,
            files_index: 0,
            children_index: 0,
            child_iterator: None,
        }
    }
}


pub struct FileTreeNodeIterator<'a> {
    pub node: &'a FileTreeNode,
    pub files_index: usize,
    pub children_index: usize,
    pub child_iterator: Option<Box<FileTreeNodeIterator<'a>>>,
}

impl<'a> Iterator for FileTreeNodeIterator<'a> {
    type Item = &'a OsString;

    fn next(&mut self) -> Option<Self::Item> {
        // Iterate through files in node first.
        if let Some(file) = self
            .node
            .files
            .get(self.files_index) {
            self.files_index += 1;
            return Some(file);
        }

        // Once files are exhausted, iterate through children's files.
        while self.children_index < self.node.children.len() {
            // Recursively get child iterators.
            if self.child_iterator.is_none() {
                self.child_iterator = Some(Box::new(self
                    .node
                    .children
                    .get(self.children_index)
                    .unwrap()
                    .iter()
                ));
            }

            // If iterator has next item, return it.
            if let Some(next_item) = self
                .child_iterator
                .as_mut()
                .unwrap()
                .next() {
                return Some(next_item);
            }

            // Otherwise, move on to next child.
            self.child_iterator = None;
            self.children_index += 1;
        }

        // Children exhausted. No more items.
        None
    }
}

Output:

file1.txt
file2.txt
file3.txt
file4.txt
file5.txt
file6.txt
1 Like

that is not true. there are different types of iterators for the same collection type. for instance, Vec::iter(), Vec::iter_mut() and Vec::into_iter() will yield different iterator types, and they all behave differently. the first two (iter() and iter_mut()) are from the slice implementation (because Vec implements Deref and DerefMut), the other one is implementation of IntoIterator trait.

you can do it either way, depending the performance characteristics, but generally, iterators tends to be lazy.

there's no single "right" answer, but personally I would prefer to use a separate type, as it's more flexible. but if you only plan to support a single way to iterate through your data structure, it makes sense to implement Iterator (or IntoIterator for that matter) directly on the data type itself.

2 Likes

You always need to have a separate struct for the Iterator state. Never implement iterators directly on collections.

Most collections have both iter() that gives references to items in the collection, and into_iter() (or drain()) that takes ownership of the collection and destroys it in the process of returning its data as owned. These will require two separate implementations.

You can also implement iter_mut() that returns mutable references to items in the collection. In 90% of cases you will have to use unsafe to fudge lifetimes, because the borrow checker can't verify correctness of mutable iterators.

Keep in mind that the Iterator can't return a reference to itself, so if you're computing something lazily, you have to do it without exposing any temporary buffers of the iterator.

5 Likes

I was still confused about some things, but your answer was enough to get me on the right path searching for the right things, so I'll mark it as a solution. I will update my post with a sample of what ended up working for me.

1 Like