PathBuf and Path. Why not String?

I need a bit of philosophical help.
I do not see why I should be bothering with PathBuf and Path when I can just use String.
I am doing quite a bit of string comparisons on Path which is cumbersome.
For example I need to see of a path contains the string “snapshot”. So if f holds a PathBuf to find where the string snapshot is I have to jump through hoops.
f.as_path().to_str().unwrap().to_string().find("snapshot.jpg").unwrap()
If f were a String this would be simpler.
If I need to find the file name with: Path::new(f.as_str()).to_path_buf().file_name().unwrap().to_str().unwrap()
Both are wordy.
The philosophical question is: Does the choice of String or PathBuf siply come down to which of those two patterns above dominate?
The use of OsStr is a clue that if I wished my code to be protable and safe for perverse file names I should stick with PathBuf. Is that all?

You cannot just use String. Paths are wrappers around OS strings. Please read the OS string docs: https://doc.rust-lang.org/std/ffi/struct.OsString.html

1 Like

No. I can use strings. I do not have perverse paths with embedded zeros.
I have read the documentation about OsString.
What you are saying (perhaps without being aware) is portability is a reason. I know that. I am wondering if there are other reasons.
That is unless I missed your point

What the documentation of OsString says is that Rust String is UTF-8, OS path is not, so they can’t be the same. It has nothing to do with embedded zeros or portability.

2 Likes

OsString is still a superset of UTF-8, it just allows surrogate codepoints. Otherwise str couldn’t impl AsRef<OsStr>.

I don’t think there is any reason besides portability, and the fact that all of the path-based utilities like .join() that properly treat components are defined on Paths. When you do .to_string().find("snapshot.jpg") it will also match unrelated-snapshot.jpg and snapshot.jpg.gz.

2 Likes

This isn’t quite precise. On Unix, an OS string can contain arbitrary bytes. On Windows, an OS string can non-lossily roundtrip an arbitrary sequence of unsigned 16 bit integers. The latter case is accomplished via WTF-8, which is a superset of UTF-8 that permits surrogates. WTF-8 isn’t used on Unix.

1 Like

If the problem is you find it too wordy, why not use some wrapper functions? Some possible examples:

fn indexof(buf: &PathBuf, substr: &str) -> Option<usize> {
    buf.to_str()?.find(substr)
}
fn is_snapshot(buf: &PathBuf) -> bool {
    let fname = OsStr::new("snapshot.jpg");
    match buf.file_name() {
        Some(s) if s == fname => true,
        _ => false,
    }
}

Here’s a playground link.

3 Likes

There are valid paths which can’t be represented as a string, and programs that use the string type for paths will fail on them.

This already happens:

2 Likes

It’s more even than this detail of character-set encoding. Paths also have other semantic encoding.

As a critical example, path components like ../ (or ..\ for portability) need to get normalised and handled properly. There have been many, many historical security bugs that ultimately derive from treating paths as just strings and not accounting for how they will be processed by underlying OS layers.

2 Likes

Excellent.

So in my situation where the only paths I care about are paths I have written I will use String.

Do you need some philosophy to help ? - OK, - Rust is system programming language -so it must give reach (at need) to a system objects in the most effective way. It means to be effective in memory consumption, robust code execution… yet remaining high level language as much as possible. But strings (at system viewpoint) are never be simple things - the technical details of implementations (that impacts on robustness) differs from one OS to another. So Rust (as many others languages) provides some primitives that are suitable at different cases. You must understand and it is very important! -the universe of programming objects is complicated in many aspects itself. So Rust just reflects its structure and objects upon lang constructions… just as do it another programming languages + aspect (requirement ) of safety.

4 Likes

I’ve written a (mostly finished) crate https://github.com/pwil3058/rs_pw_pathux that allows me to do path operations on Strings. I’ve done this because I do a lot of GTK programming and most GTK text handling widgets only like utf-8 and I found switching between PathBufs and Strings when displaying file paths in GUIs.

That is very helpful.

Nothing new to me, but nice to have reinforced

of course not, but every bit of truth ( as a matter of fact) is measured and weighted according to everybodies inner scale of importance… so one fact that whatever is in importance for one person might be much less in importance for another person. For the instance if you must (for a current job) perform some manipulations on strings with no real need for effectivity you are really (and instinctively) expecting from a cool programming language
(as Rust) to see something like s1 + s2. And your mind tells yourself (and everybody on this forum :slight_smile: ) -“i just wanna concatinate or search upon some strings - why the such cool language as Rust does insist me upon doing on some extra works?” - and of course you (and your mind) are right! More over you are may be quite aware that there are another cases but they are a bit out of real importance (at least for the moment). I think that it is needles to say that the viewpoint may change as your skills, powers, abilities will grow up, or you will face to other challenges

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.