Apparently, when I join two absolute paths, instead of (expected) joining, one of them is completely replaced. This is from the documentation of Path join:
Creates an owned [PathBuf] with path adjoined to self. If path is absolute, it replaces the current path. assert_eq!(Path::new("/etc").join("/bin/sh"), PathBuf::from("/bin/sh"));
Which is super counter-intuitive: for example, if I type cd /etc/bin/asdf, I will go to /etc/bin/asdf, not to /bin/asdf, but Path::join would produce /bin/asdf (!).
It seems that this is a common source of confusion:
What was the rationale/motivation behind implementing this behaviour for join? Why does it replace paths, when one of them is absolute?
If I were designing this, I would say that it should behave just like regular string concatenation, omitting repetitive slashes. So, if one path ends with / and the other starts with /, it should omit one of them. This would match perfectly well to how terminal/bash works.
Is there anything that could fix the issue? Shall I just create my own join function (which would remove the leading / from the 2nd path and use join)?..
P.S.
Looks like for an answer to the 2nd question I can use strip_prefix (?):
/// let path = Path::new("/test/haha/foo.txt");
///
/// assert_eq!(path.strip_prefix("/"), Ok(Path::new("test/haha/foo.txt")));
It probably would be better to have separate type for absolute paths.
But now, join will always have broken behavior. Rust chose one which was already used in other languages.
I'm not really sure how you arrived at your shell example. Path::new("/etc").join("/usr/bin") would actually be equivalent to:
/etc $ cd /usr/bin
Which would, as you should expect, try to change the directory to /usr/bin and not/etc/usr/bin. If you wanted to change to /etc/usr/bin, you'd use cd usr/bin. That is, you'd use a relative path without a leading slash.
This is, in my opinion, a completely reasonable design that is entirely consistent with how every single shell and operating system I've ever used works.
Oh... I think now I at least understand how it could be introduced this way...
It looks like you can imagine joining of p1 and p2 as:
cd $p1
cd $p2
This is certainly not what comes into my mind first
I arrived at my example like this:
What would shell do if I wanted to cd $p1/$p2 <--- I personally think that this closer matches "joining" paths, but thank you for explanation. I think it answers my first question.
Well, if you applied that to your earlier example, you'd end up with /etc//usr/bin, which wouldn't be a valid path.
I've seen, written, and laboriously debugged enough path-handling code over the years that I actively resist treating paths as strings. That's why when I think about path operations, I think about filesystem operations, not string operations.
The behavior is useful because a caller (or config file or whatever) can choose whether it wants to use a relative or absolute path, and the callee can then simply absolutize it by adding its own prefix and the absolute path is unaffected which is probably what the caller wanted. The callee doesn't have to separately check whether the path is absolute or not.
One could also think of an absolute path as a special type of relative path where the starting / is understood to stand for as many ../ components as needed to reach the root.
Yeah, agree, 100%. There are cases when current behaviour is useful. It would be nice not to have this behaviour for the function that is called join (if you joinA and B, you never expect A to completely go away / vanish)... Or maybe at least have another standard function that does it differently.
But what exactly should it do when you join C:\WINDOWS and E:\MyHome
It should suggest changing OS.
To be honest, I do not know. Maybe for Windows there should be some Windows-specific API?.. Hard to say. But having "broken" join() for the rest of the world is not good either.
Linux is not “the rest of the world”. VxWorx is supported by Rust and uses the same path format as Windows.
And on POSIX-compliant system one may have//vol/path which is distinct from /vol/path (although, here I'm not sure whether anything but CygWin uses that… but this still would imply that there would be yet-another separate path name style with it's own separate path name handling).
That's the typical answer to “why Rust (or a crate… or a program…) does some weird thing?” question: it's usually weird to you because you don't know about many things that exist in that world.