Hello,
I'm a beginner. I wrote a little tool to iterate throw a directory recursively. Could you please review it code ? (I know there's WalkDir , but I need to ignore some directories)
I put the code here :
Hello,
I'm a beginner. I wrote a little tool to iterate throw a directory recursively. Could you please review it code ? (I know there's WalkDir , but I need to ignore some directories)
I put the code here :
if self.current_pos < self.lst_dirent.len() {
unsafe {
let de = self.lst_dirent.get_unchecked(self.current_pos);
self.current_pos += 1;
return Some(de.clone());
}
} else {
return None;
}
As best practice, I think the unsafe should be minimised even more like this:
if self.current_pos < self.lst_dirent.len() {
let de = unsafe {
// SAFETY: self.current_pos is checked above to be in bounds
self.lst_dirent.get_unchecked(self.current_pos)
};
// Continue to use de in safe rust
}
I also think it's good practice to accompany unsafe blocks with a comment explaining the requirements/invariants for safety, which I've added.
let Some(thing) = maybe_thing else {
return None;
}
// Now `thing` is in scope :)
return
statements. It's idiomatic to drop them when not needed - they are only needed for early returns (e.g. to break out of loops or when using let else).Finally, you should know that walkdir does support filtering out paths. It has an example in the readme.
They provide an iterator interface too, but they impl IntoIterator
which is what makes the for
loop syntax work.
It will probably be educational to read through their source and compare their approach with yours, you might learn a thing or two
Not a complete review; just following up from the previous reply mainly.
Don't use unsafe
for performance reasons unless testing has pinpointed a compelling improvement. I looked at this version and it had the same ASM as the unsafe
version, thanks to the immediately preceding length check most likely.[1]
if self.current_pos < self.lst_dirent.len() {
let de = self.lst_dirent.get(self.current_pos).unwrap();
self.current_pos += 1;
return Some(de.clone());
} else {
return None;
}
And here's a version with no unwrap
and no significant ASM difference.
fn next(&mut self) -> Option<Self::Item> {
let de = self
.lst_dirent
.get(self.current_pos)?
.clone();
self.current_pos += 1;
Some(de)
}
But, let's take a step back. Iterating by index is a bit of yellow flag: you could probably just hold onto the container's iterator implementation and call self.field.next()
. So let's take a look at what we've got here:
struct DirContentIter {
lst_dirent: Vec<Arc<DirEntry>>,
current_pos: usize,
}
You're going to iterate over the indices, clone each Arc
, and then when iteration is through -- drop the Vec
and all the original Arc
s. Instead, you can just wrap up vec::IntoIter
and avoid all the cloning.
struct DirContentIter {
lst_dirent: std::vec::IntoIter<Arc<DirEntry>>,
}
impl Iterator for DirContentIter {
type Item = Arc<DirEntry>;
fn next(&mut self) -> Option<Self::Item> {
self.lst_dirent.next()
}
}
The main reason to keep the newtype is flexibility to change your implementation later. But if you do keep it, you should consider implementing other iterator traits like FusedIterator
and ExactSizeIterator
, and also other methods like size_hint
. (Keeping in mind that every trait implementation is an obligation that limits your flexibility.) Since you're just wrapping another iterator, you can just forward the methods like I did with next
.
Incidentally "skimming the ASM" isn't what I meant by testing either, but it's enough to show there's no need for unsafe
in this case. âŠī¸
A big thanks for your help your time and your good advice !
let Some(x) = y else { return None;}
is really cool !! The code is more clear and beautiful (and so I found a bug)vec
and his position by an iterator so the code is simpler.... but I'm surprised that I can declare a vector in the constructor body and have the iterator lifetime be usable beyond it (so I guess the vector datas are copied inside the iterator memory , or a magical trick (?) ). But it's ok the pgm runs well :impl DirContentIter {
pub fn new(path: &Path, option: &DirIterOption) -> Result<Self, std::io::Error> {
let mut lst_dirent = Vec::new();
for i in read_dir(path)? {
if let Ok(dirent) = i {
lst_dirent.push(Arc::new(dirent));
}
}
if option.sortby == SortBy::Filename {
lst_dirent.sort_by_key(|de| de.file_name());
}
Ok(Self {
it_dirent: lst_dirent.into_iter(),
})
}
}
impl Iterator for DirContentIter {
type Item = Arc<DirEntry>;
fn next(&mut self) -> Option<Self::Item> {
self.it_dirent.next()
}
}
into_iter()
takes ownership of the Vec and returns a type that implements Iterator called IntoIter
, the data doesn't get cloned.
The iterator lifetime is a mutable ref to Self, which is IntoIter
, i.e. you have unique access to the data while iterating, and the Vec is gone, no references to it remain.
In other words, iterators don't have to refer to (borrow) some external collection; they can own the items they hand out during iteration (or produce them dynamically with or without borrowing, etc). IntoIter<T>
is an example of such an owning iterator.
Due to the confusion many newcomers have between Rust lifetimes (those '_
things) and how long values stay alive, I wouldn't refer to IntoIter
's ability to stay alive due to taking ownership as a "lifetime".
(I'm actually not even sure how to interpret that line. What mutable ref? IntoIter<T>
doesn't contain a &mut _
in this example (since T
doesn't). The unique access is due to ownership.)
It's great ! Thanks
I was trying to refer to the method signature of Iterator::next(&'_ mut self)
and explain that '_ relates to the owning Iterator type (IntoIter
) and not the Vec.
But I was a bit clumsy with my wording, I appreciate the clarification!
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.