Rationale behind replacing paths while joining?

Apparently, when I join two absolute paths, instead of (expected) joining, one of them is completely replaced. This is from the documentation of Path join:

Creates an owned [PathBuf] with path adjoined to self.
If path is absolute, it replaces the current path.
assert_eq!(Path::new("/etc").join("/bin/sh"), PathBuf::from("/bin/sh"));

Which is super counter-intuitive: for example, if I type cd /etc/bin/asdf, I will go to /etc/bin/asdf, not to /bin/asdf, but Path::join would produce /bin/asdf (!).

It seems that this is a common source of confusion:


My main questions:

  1. What was the rationale/motivation behind implementing this behaviour for join? Why does it replace paths, when one of them is absolute?
    • If I were designing this, I would say that it should behave just like regular string concatenation, omitting repetitive slashes. So, if one path ends with / and the other starts with /, it should omit one of them. This would match perfectly well to how terminal/bash works.
  2. Is there anything that could fix the issue? Shall I just create my own join function (which would remove the leading / from the 2nd path and use join)?..

P.S.

Looks like for an answer to the 2nd question I can use strip_prefix (?):

    /// let path = Path::new("/test/haha/foo.txt");
    ///
    /// assert_eq!(path.strip_prefix("/"), Ok(Path::new("test/haha/foo.txt")));

It probably would be better to have separate type for absolute paths.
But now, join will always have broken behavior. Rust chose one which was already used in other languages.

2 Likes

I'm not really sure how you arrived at your shell example. Path::new("/etc").join("/usr/bin") would actually be equivalent to:

/etc $ cd /usr/bin

Which would, as you should expect, try to change the directory to /usr/bin and not /etc/usr/bin. If you wanted to change to /etc/usr/bin, you'd use cd usr/bin. That is, you'd use a relative path without a leading slash.

This is, in my opinion, a completely reasonable design that is entirely consistent with how every single shell and operating system I've ever used works.

19 Likes

Oh... I think now I at least understand how it could be introduced this way...

It looks like you can imagine joining of p1 and p2 as:

  1. cd $p1
  2. cd $p2

This is certainly not what comes into my mind first :slight_smile:

I arrived at my example like this:

What would shell do if I wanted to cd $p1/$p2 <--- I personally think that this closer matches "joining" paths, but thank you for explanation. I think it answers my first question.

Well, if you applied that to your earlier example, you'd end up with /etc//usr/bin, which wouldn't be a valid path. :slight_smile:

I've seen, written, and laboriously debugged enough path-handling code over the years that I actively resist treating paths as strings. That's why when I think about path operations, I think about filesystem operations, not string operations.

6 Likes

Yep... Hence Path::join, which should do a bit more that just bare string1 + "/" + string2.

The behavior is useful because a caller (or config file or whatever) can choose whether it wants to use a relative or absolute path, and the callee can then simply absolutize it by adding its own prefix and the absolute path is unaffected which is probably what the caller wanted. The callee doesn't have to separately check whether the path is absolute or not.

One could also think of an absolute path as a special type of relative path where the starting / is understood to stand for as many ../ components as needed to reach the root.

1 Like

Yeah, agree, 100%. There are cases when current behaviour is useful. It would be nice not to have this behaviour for the function that is called join (if you join A and B, you never expect A to completely go away / vanish)... Or maybe at least have another standard function that does it differently.

But what exactly should it do when you join C:\WINDOWS and E:\MyHome ?

5 Likes

But what exactly should it do when you join C:\WINDOWS and E:\MyHome

It should suggest changing OS.

To be honest, I do not know. Maybe for Windows there should be some Windows-specific API?.. Hard to say. But having "broken" join() for the rest of the world is not good either.

Linux is not “the rest of the world”. VxWorx is supported by Rust and uses the same path format as Windows.

And on POSIX-compliant system one may have //vol/path which is distinct from /vol/path (although, here I'm not sure whether anything but CygWin uses that… but this still would imply that there would be yet-another separate path name style with it's own separate path name handling).

That's the typical answer to “why Rust (or a crate… or a program…) does some weird thing?” question: it's usually weird to you because you don't know about many things that exist in that world.

Sometimes developers decide that some special kind of weirdness is just too much weird and doesn't deserve special treatment (e.g. you may Wallmart and buy a device with 24bit CPU but it was deemed to weird to be supported by Rust (and we may forget about historical devices with one's-complement or 36bit words.

But Windows, VxWorks… these are too popular to be ignored.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.