I'm writing my first iterator for a custom struct which is essentially a basic tree data structure representing files on a filesystem. The exact structure isn't super relevant to my questions, but here's a Playground link for reference:
Rust Playground (rust-lang.org)
use std::path::PathBuf;
fn main() {
// Example hierarchy:
// /a/file1.txt
// /a/b/file2.txt
// /a/b/file3.txt
let tree = DirectoryFileTree {
node: DirectoryFiles {
directory: PathBuf::from("/a"),
files: vec!["file1.txt".to_string()],
},
children: vec![
DirectoryFileTree {
node: DirectoryFiles {
directory: PathBuf::from("/a/b"),
files: vec![
"file2.txt".to_string(),
"file3.txt".to_string(),
],
},
children: vec![],
}
],
};
println!("{:#?}", tree);
}
#[derive(Clone, Debug)]
pub struct DirectoryFiles {
pub directory: PathBuf,
pub files: Vec<String>,
}
#[derive(Clone, Debug)]
pub struct DirectoryFileTree {
pub node: DirectoryFiles,
pub children: Vec<DirectoryFileTree>,
}
I get that I'm supposed to implement the std::iter::Iterator trait
, but there are some considerations which don't seem to be covered by that page or many other guides:
- I believe the standard library collection iterators yield immutable references to the underlying collection data. Is this conventional behavior for custom iterators, as well?
- Should my iterator "snapshot" the underlying data by pre-populating a queue of values, or should it lazily compute values? For the trivial counting example in the docs, it obviously doesn't make sense to pre-populate values, but for my tree structure, I wonder about the implications of either approach. Realistically, for my basic use case, any references will be short-lived and there won't be concurrent access to the underlying data, but is there any best practice for creating iterators on "hot" data in concurrent programs?
- Does it make sense to implement the Iterator trait on the
DirectoryFileTree
type itself, or a separateDirectoryFileTreeIterator
type? One issue I thought of was that implementing on the tree type directly requires internal state per iterator, so it doesn't make sense for having multiple iterators. But then if you make a separate iterator type, it gets back to my previous questions about how to reference the underlying data.
I guess I'm just asking what the common approach to iterators is. I'm kind of an old-school C programmer, so iterators in modern languages are a bit opaque to me.
Update: I certainly don't know all the best practices for iterators yet as some things appear to be dependent on use case, especially those with concurrent writes. That being said, I wanted to post a working solution in case it helps anyone else new to iterators:
Rust Playground (rust-lang.org)
use std::{
ffi::OsString,
path::PathBuf,
};
fn main() {
let node1 = FileTreeNode {
directory: PathBuf::from("/a/b"),
files: vec![
OsString::from("file3.txt"),
OsString::from("file4.txt"),
],
children: vec![],
};
let node2 = FileTreeNode {
directory: PathBuf::from("/a/c"),
files: vec![
OsString::from("file5.txt"),
OsString::from("file6.txt"),
],
children: vec![],
};
let node3 = FileTreeNode {
directory: PathBuf::from("/a"),
files: vec![
OsString::from("file1.txt"),
OsString::from("file2.txt"),
],
children: vec![
node1,
node2,
],
};
node3
.iter()
.for_each(|file| println!("{}", file.to_string_lossy()));
}
pub struct FileTreeNode {
pub directory: PathBuf,
pub files: Vec<OsString>,
pub children: Vec<FileTreeNode>,
}
impl FileTreeNode {
pub fn iter(&self) -> FileTreeNodeIterator {
FileTreeNodeIterator {
node: &self,
files_index: 0,
children_index: 0,
child_iterator: None,
}
}
}
pub struct FileTreeNodeIterator<'a> {
pub node: &'a FileTreeNode,
pub files_index: usize,
pub children_index: usize,
pub child_iterator: Option<Box<FileTreeNodeIterator<'a>>>,
}
impl<'a> Iterator for FileTreeNodeIterator<'a> {
type Item = &'a OsString;
fn next(&mut self) -> Option<Self::Item> {
// Iterate through files in node first.
if let Some(file) = self
.node
.files
.get(self.files_index) {
self.files_index += 1;
return Some(file);
}
// Once files are exhausted, iterate through children's files.
while self.children_index < self.node.children.len() {
// Recursively get child iterators.
if self.child_iterator.is_none() {
self.child_iterator = Some(Box::new(self
.node
.children
.get(self.children_index)
.unwrap()
.iter()
));
}
// If iterator has next item, return it.
if let Some(next_item) = self
.child_iterator
.as_mut()
.unwrap()
.next() {
return Some(next_item);
}
// Otherwise, move on to next child.
self.child_iterator = None;
self.children_index += 1;
}
// Children exhausted. No more items.
None
}
}
Output:
file1.txt
file2.txt
file3.txt
file4.txt
file5.txt
file6.txt