Advice for beginner, new crate

Hi, I'm relatively new to Rust and it has quickly become my favorite. I've recently published a crate and I'm looking for some advice A few weeks ago I was tinkering with some "backend" code and I decided to take part of it and make it into a crate. It's now on crates.io, it's called dir_walker. There are alternatives out there but I wanted to traverse the file system having a type that is compatible with the standard library. I'm looking for some feedback and comments on what could be improved. I's a small crate so it doesn't take long to read. I appreciate the help!

1 Like

Two comments:

  • don't call the item type an EntryIterator. That sounds like it is the iterator type itself. It is not – it's the item that the iterator yields.
  • I don't understand why there is a separate Entry and Value. The Value type doesn't seem to be necessary. You could have just inlined its fields directly into Entry.

Thanks for the quick reply.

  • Sounds good. I'll change the name in the next version.
  • I'm making Entry as a recursive structure without using "pointer types" and avoiding ownership problems. Of course there is nothing wrong with e.g. Box or other Cell types but I wanted to try a different route.

You must be severely misunderstanding something. Wrapping a struct inside another does not provide indirection. There's no difference between Entry containing a Value vs. Entry containing the fields of Value directly. If your code compiles currently, then it would also compile if you inlined the fields of Value into Entry. The required indirection is provided by the Vec. There's zero need to box anything.

(I don't know how Cell even comes into play. It's for interior mutability, which is a completely orthogonal issue.)

3 Likes

I have a long experience (in java, not Rust, mind that. In Rust I'm a N00b :slight_smile: ) with directory walkers. I have the following concerns about your code:

  1. I haven't found any checks against symbolic links. Specially on linux, where symbolic links are considered first-class citizens, you can have a directory structure that is infinitely recursive. For example /home/myself/directory may have a "myhome" directory that is a symbolic link to /home/myself. So, I recommend that you skip symbolic links or checks for those and get recursion against traversing the same (absolute) directory multiple times.

  2. Instead of using vectors for storing the found paths, consider trying to use iterators. I have found some situations at production where a single directory contains literally millions of files, which may cause your process to consume an absurd amount of memory.

  3. If you take a look at the DirEntry specification, a DirEntry may consume a file handle even if dropped. So I don't recommend that you keep DirEntries in cache, but some wrapper that only opens the entry once when needed, and then close it after. Keep only the paths as strings. Perhaps in a wrapper structure.
    DirEntry in std::fs - Rust

  4. If you really consider the ordering as an important factor in the file tree traversal, consider making a limit on the entries collected per directory: to avoid the "one million directory" to trash your memory.

  5. Will this run only on linux, or mac-os, ever? A ".something" is a hidden directory for linux. Not necessarily for windows, that has the hidden attribute for files.

  6. Talking about the tests: consider make the test to be repeatable and assertable. The tests there are only "println!" outputs. So they are unfalsifiable.

A simple solution is to create a temporary directory, add some directories and files there, with the typical use cases, and then test the solution against those files.

4 Likes

Thanks for the tips, these are valid points. I was just looking into point n. 2 while keeping the recursive, "depth first" structure as a result. I'm also going to fix the tests as you mentioned.

Edit: the order of the entries was a strong requirement for this crate (as well as having a recursive structure), but that may lead to the problem you mentioned in point 4.

1 Like

That's true, well I guess that's experience. The official docs point at the use of Box for recursive structures, but I guess it's not the only way to make one. When I used the expressions "Cell types" I was pointing at the mentioned "ownership problems" (which I did encounter) and the types that are usually used to alleviate that problem rather than referring exactly to the Cell type. The doc of std::cell states:

RefCell<T> uses Rust’s lifetimes to implement “dynamic borrowing”, a process whereby one can claim temporary, exclusive, mutable access to the inner value.

At some point I did have problems with interior mutability but I opted for another solution.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.