Std::fs::remove_dir_all, readdir and concurrent modification

Hello!

I have written a FUSE filesystem using rust-fuse. I was using std::fs::remove_dir_all to test that files in my file system can be removed properly. This is currently failing for large directories, when the number of files in the directory requires multiple readdir calls. I notice that remove_dir_all:s sister function sys_common::fs::remove_dir_all_recursive is implemented by iterating over the directory entries and performing deletions while iterating. The iterator is aquired by calling fs::read_dir.

This approach seems a bit fragile. POSIX does not seem to guarantee that this works, see for instance this SO question.

In my case, I first receive one readdir call which returns the first N entries. Then I get N file deletions in said directory, before getting another readdir call with offset N. In my current filesystem implementation the offset maps directly to files, and since there are N files less in the directory at this stage, the offset is no longer valid and a lot of files are missed during deletion.

I wonder if you think this should be considered a bug in Rust, or if I’m missing something. Thanks!

1 Like

Considering the POSIX definition that file deletions may invalidate a DIR* I think this is a bug. I don’t know what the correct behavior should be if files are being added during remove_dir_all. Do you think the code should loop until it reaches a convergent state?

Thank you for your input. Regarding a solution, I saw a suggestion to start iterating from the start of the directory after each “batch”. The downside is that that approach wouldn’t work with the current iterator, which doesn’t expose batch information.

What do you want the behavior to be in the pathological case that entries are created faster than they are being removed? Should remove_dir_all just hang? Or should it return without having removed everything?

Hmm, that’s true. Well, another option is to load the entire dir into memory (or temp file), but not sure about that solution either.

Maybe restart from the beginning, and bail after a max number of entries?

But the most important fix is just to ensure that the reader doesn’t collide with itself, solving true concurrency with other reader/writers feels less important IMO.

Well an initial fix could just be to collect::<Vec<_>> read_dir before the for loop.

After re-reading https://docs.rs/fuse/0.3.1/fuse/trait.Filesystem.html#method.opendir and https://github.com/emscripten-core/emscripten/issues/2528 , I think I’m to blame here. The fs is expected to track each reader separately and return a stable list which is unaffected by intermingled modifications.