Path `components()` vs. `collect()` - inconsistent behavior?

Hello.

Function Path::components() is used to split a path into its components. Calling collect() on an iterator of Component is the "reverse" operation, i.e. build a path from the components.

Except for some basic normalization, components()collect() should be inverse functions.

But, I'm getting a "weird" behavior when it comes to drive letters on Windows! :thinking:

Considering this code:

fn normalize(path: &Path) {
    println!("{:?}", path);
    let mut components = Vec::with_capacity(8);
    let mut n = 0usize;
    for component in path.components() {
        println!("[{}] = {:?}", n, component);
        n += 1;
        components.push(component);
    }
    println!("--> {:?}\n", components.into_iter().collect::<PathBuf>());
}

Result is:

"/foo/bar/"
[0] = RootDir
[1] = Normal("foo")
[2] = Normal("bar")
--> "\\foo\\bar"

"C:\\foo\\bar"
[0] = Prefix(PrefixComponent { raw: "C:", parsed: Disk(67) })
[1] = RootDir
[2] = Normal("foo")
[3] = Normal("bar")
--> "C:\\foo\\bar"

"meh\\C:\\foo\\bar"
[0] = Normal("meh")
[1] = Normal("C:")
[2] = Normal("foo")
[3] = Normal("bar")
--> "C:foo\\bar" // <-- expected: "meh\\C\\foo\\bar"

Problem is with the last example. The drive letter C: is detected as a "normal" component, rather than "prefix" – which probably is fine/expected here, as it appears in the middle of the path. But weird things happen when the components are joined together again! Even though C: is a "normal" component, it suddenly is treated as if it was a prefix, completely throwing away the other "normal" component(s) that come(s) before it. Even worse, the resulting path is not even well-formed, missing a required directory separator between the C: and foo components. If at all, we should get something like C:\\foo\\bar. But even that shouldn't happen, I think, as C: explicitly was a "normal" component. Ideas?

Any suggestions how I can get my normalize() to do the "expected" thing? I want to create a function that normalizes a user-supplied path, but – unlike canonicalize() – the path may not exist yet.

It appears that drive letters need to be handled specially in normalize(), if they ever appear as a "normal" component. But how can I be sure there are no other "special" cases I need to handle?

Thank you!

2 Likes

I'm clueless about Windows, but here's a probable breadcrumb: PR 89270, a non-FCP'd breaking change to Windows PathBuf::push. (The collect is just a series of push.)

Tangential note that Rust's platform-agnostic path generalization, though admiral, is flawed. E.g. the extension logic is designed as if multiple extensions could not exist.

2 Likes

This behaviour has been present since at least Rust 1.10 (the oldest rustc I currently have installed). It does not have to do with verbatim paths (because this path does not start with \\?\ it is not verbatim).

I would recommend opening a bug report.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.