Quick background: I began seriously studying Rust about 9 months ago. Though I've spent many years using Windows, the past 3+ years I've worked exclusively in Linux environments. So my point-of-view is certainly "penguin-oriented."
Reason for my Headaches Today:
- Path != &str
- PathBuf != String
-
"...converting a Path to a &str is not free and may fail (an
Option
is returned)"
Path - Rust By Example
Learning these truths can give newcomers (like myself) a lot of headaches. There are many crucial learning curves in Rust. You must think about memory. Learn about the different string types. Ownership. Lifetimes. Traits. Macros. And plenty more.
Until now, even though I struggled, I ultimately triumphed and felt happier about Rust. Ownership was a tough climb. But it makes perfect sense and once it "clicked", things got easier. Lifetime syntax can still trip me up. But I do understand the point, and when they're needed. Using them has gotten easier, and I'm confident it will continue to.
Yet then there's Path. I believe I understand the challenges with Path. I realize the different operating system implementations are the root cause of these challenges (not Rust). I understand that Path is not UTF-8. Not every value of Path can cleanly convert to UTF-8 (and therefore Rust strings).
Right now, I'm mostly concerned about operating system file paths.
- Most real-world values of a file Path will cleanly convert to UTF-8. Are there exceptions? Yes. But these are rare edge cases.
- Many (most?) Rust crates that do useful work with strings, do so with Rust strings (UTF-8)
- The documentation discourages us from using unwrap()
- However, manipulating or working with Path silently encourages newbie Rustaceans to make liberal use of unwrap():
use std::path::{Path};
use regex::Regex;
const PATTERN1: &'static str = "(brian)";
const PATTERN2: &'static str = "(john)";
fn contains_pattern(bar: &str, pattern: &str) -> bool {
let iso_regex = Regex::new(pattern);
iso_regex.unwrap().is_match(bar)
}
fn main() {
let not_useful_path = Path::new("/home/brian");
// Regex cannot use Path, so we're forced to unsafely convert to string.
let useful_string = not_useful_path.to_str().unwrap();
println!("Result 1: {:?}", contains_pattern(useful_string, PATTERN1));
println!("Result 2: {:?}", contains_pattern(useful_string, PATTERN2));
}
Many, many excellent and useful crates were written to use Strings. Not Path. Not OsPath. This leaves the developer with uncomfortable choices:
Option 1: Avoid using the Path type altogether. Always use string types.
Pros: You may now utilize various libraries and crates, and stop wasting time converting Paths to strings (which we know isn't 100% safe anyway).
Cons: You will miss out on useful Path functions like join() and parent(). You'll have to find libraries that duplicate these functions for String. Or build your own!
Also, your pseudo String-as-Path types will not be compatible with OsStr. This may later result in unexpected errors and failures.
Option 2: Use Path when needed, and unsafely convert to String.
Pros: 99% of the time, the conversion to UTF-8 will be okay.
Cons: As developers, we worry about stability and correctness. We worry about those Paths that might not convert to UTF-8. Also, your code is going to be filled with clunky as_os_str().unwrap().to_string().unwrap()
statements all over the place.
You will miss out on useful AsRef conversions too!
// This code will not compile
use std::path::Path;
pub fn foo<'a, S>(parm_path: S) -> ()
where S: AsRef<&'a str> {
println!("My path as string = {:?}", parm_path.as_ref());
()
}
fn main() {
let p: &Path = Path::new("/home/brian/some_file.txt");
foo(p);
}
"the trait std::convert::AsRef<&str>
is not implemented for std::path::Path
"
...
Option 3: Use Path when you need to. Whenever you need crates (ex. Regex) and the crate requires working with String, re-write the crate to use Path or OsStr instead.
Pros: Appears to be the only "correct" solution Rust is offering us.
Cons: Wildly impractical to rewrite every crate you will need.
Today, I'm doing Option 2. I don't like it. I don't like writing these ''unwrap()'' statements all the time, or doing error handling for hypothetical edge cases that I cannot easily reproduce. Yet I have no better ideas. If my program ever encounters such an odd filename, I certainly would hope/expect it panics! My project was never designed to work with such things.
I'd really like an Option 4. But I don't know what that is.
I understand Rust is supposed to be safe, not choke on unusual values, and be very stable. I'm all for that.
But Rust is also a systems language. And the manipulation of file paths and names? Well, that seems like a core part of systems programming (particularly in Linux, where "everything is a file"). Yet Rust is weirdly awkward and cumbersome when it comes to working with file Paths. You can do a couple useful things, but you quickly encounter a requirement for Strings.
For the first time with Rust, I've learned "why" things didn't work the way I expect...but I've walked away feeling very unsatisfied.