Path::display
is lossy. If the path contains invalid UTF-8---which is absolutely legal---then Path::display
will use the Unicode replacement codepoint:
use std::{os::unix::ffi::OsStrExt, ffi::OsStr, path::Path};
fn main() {
let osstr = OsStr::from_bytes(b"foo\xFFbar");
let path = Path::new(osstr);
assert_eq!(path.display().to_string(), "foo\u{FFFD}bar");
assert_eq!(path.display().to_string().as_bytes(), b"foo\xEF\xBF\xBDbar");
}
Playground: Rust Playground
This is explicitly documented right in the Path::display
docs:
Returns an object that implements Display
for safely printing paths that may contain non-Unicode data. This may perform lossy conversion, depending on the platform.
The "safe" part here means that you are guaranteed that the output is valid UTF-8. This is a requirement for implementations of the std::fmt::Display
trait. That is what makes is "safe," although I think this is probably misleading terminology (particularly given that "safe" tends to have a more precise technical definition in the context of Rust programs). It is being used colloquially here to mean, "it is okay to use this output in a context that requires valid UTF-8 even though a Path
itself may not be valid UTF-8."
But the output is still lossy. So if you print a path this way that contains invalid UTF-8 and that is then used somewhere else as an input file path to another program, then you'll get a different file path than what you started from.
So how do you print file paths "safely" in a non-lossy way? You kinda can't. On Unix at least, you can just write arbitrary bytes to file descriptors. So in order to do that, you need to write platform specific code. bstr
provides some routines for doing that in a platform independent way with some costs that are incurred on Windows.
The point of the .display()
method is to act as a speed bump: it's meant to get you to pause and question if what you're doing is actually correct. Moreover, because of the design of the Display
and ToString
traits, this speed bump means that you can't do path.to_string()
(like you can for anything that implements the Display
). If path.to_string()
were possible, conventions would imply that to be a non-lossy conversion. But it can't be and that would be exceptionally misleading.
I will also note that I don't think there is unanimous agreement that this speed bump in theory is something we ought to have. But I do think there is likely unanimous agreement that we definitely want to avoid path.to_string()
being possible. So even if you don't like the idea of the speed bump giving you pause, the fact that path.to_string()
would be available if Path
implemented std::fmt::Display
means that speed bump is likely never going away.