Idiomatic way to convert non-UTF-8 vector slice to PathBuf

Please! I am sure this must be trivial, but I have spent too long trying to find the syntax to achieve this.

Thanks, Yogi39

This is OS-specific. Check out this.

On Unix

On Unix, OsStr implements the std::os::unix::ffi:: OsStrExt trait, which augments it with two methods, from_bytes and as_bytes . These do inexpensive conversions from and to UTF-8 byte slices.

Additionally, on Unix OsString implements the std::os::unix::ffi:: OsStringExt trait, which provides from_vec and into_vec methods that consume their arguments, and take or produce vectors of u8 .

On Windows

On Windows, OsStr implements the std::os::windows::ffi:: OsStrExt trait, which provides an encode_wide method. This provides an iterator that can be collect ed into a vector of u16 .

Additionally, on Windows OsString implements the std::os::windows:ffi:: OsStringExt trait, which provides a from_wide method. The result of this method is an OsString which can be round-tripped to a Windows string losslessly.

2 Likes

Sorry, I forgot to put in I am only interested in Linux.

I started this reply by copying some of my efforts & the error messages. It seemed very verbose & pointless. I had studied Path, PathBuf, Ostr, std::convert etc. crates until I was going bonkers. I was aware that this conversion was explicitly or implicitly going to regard my simple 'String of bytes' as an Ostr at one point , but the syntax? (The number of Rust 'types' to represent what I referred to there as a simple 'String of bytes' I find overwhelming. AFAICS what I have and what I want are both just this.) I am sure I have driven myself to miss the obvious. So please could I trouble you to be more detailed from my original effort:-

let pth = PathBuf::from(&rdnew[60..]);

result:

trait std::convert::AsRef<std::ffi::OsStr> is not implemented for [u8]

(rdnew is my vector). How do I correct this without going round in circles?

(The code I am producing will run through about 5 million such vectors, but once it has successfully produced an answer it may never be used again. A lot of my programming is like this.)

Thanks a lot.
Yogi.

use std::os::unix::ffi::OsStringExt;
let pth = PathBuf::from(OsString::from_vec(rdnew[60..].to_vec()));

or, equivalently:

use std::os::unix::ffi::OsStrExt;
let pth = Path::new(&OsStr::from_bytes(&rdnew[60..])).to_path_buf();
2 Likes

Thank you very much, both snippets seem from a syntax point of view. However, and worrying, the path does not seem to work, and I have no idea why. My code evolves around a set of nearly five million vectors each referring to a file. (Iteratively referred to in my code as rdnew. The path is in the vector from byte 60 on.) I hoped to write a twenty minute playground program to estimate the size of storage I would need if I decided to copy all the files referred to into one place. I only need the sum of the size of each file. I am finding it unexpectedly difficult to make rust do this. My present code snippet that fails is as follows:-

let mtdt=fs::symlink_metadata(PathBuf::from(OsString::from_vec(rdnew[60..].to_vec())));
if mtdt.is_err(){println!("fmtdt failed with {:?}\nrdnew[60..]= {}"
,mtdt,String::from_utf8_lossy(&rdnew[60..]));continue}//~2

(the //~2 tells me how 'deep' in terms of loops/scopes or whatever. In rust (only) it is always the no. left curly brackets minus right curly brackets in the code so far. Here ~1 = I am inside fn Main, ~2 = I am inside the loop iterating through the vectors. ~3 = I am now one level deep in an 'if' or whatever.)

To complete the context, the code ends from here simply:-

let mtdt=mtdt.unwrap();
if mtdt.is_file() {rqsz+=mtdt.len()+8192;} else {rqsz+=4096;} }//~1
println!("required size={}",rqsz); }//~0

The first snippet above fails every time (example chosen to be one where the path is all ASCSI):-

fmtdt failed with Err(Os { code: 2, kind: NotFound, message: "No such file or directory" })
rdnew[60..]= /media/richard/bkup02usb/hewhdb1/pictures/sonevxxx/sonev093/dsc89460.jpg

The path exists and is valid . To prove this I open a terminal, type ls -l and then copy/paste the path from the error message (I.e. from just after rdnew[60]=)

richard@yogi2 ~ $ ls -l /media/richard/bkup02usb/hewhdb1/pictures/sonevxxx/sonev093/dsc89460.jpg
-rwxrwxrwx 1 richard richard 2852309 Sep 3 2017 /media/richard/bkup02usb/hewhdb1/pictures/sonevxxx/sonev093/dsc89460.jpg
richard@yogi2 ~ $

Please can you (or anybody else) please tell me why my code does not find it and add 2852309 (with 8k more for overheads) to my variable rqsz?

The first thing I would check is that the path doesn't contain any leading or trailing whitespace characters or non-printing control characters. Instead of printing it with the "{}" formatting string, try "{:?}" (Debug formatting), which will escape any control characters.

Also, just as an aside, you can pass an &OsStr directly to symlink_metadata, which would simplify this line:

let mtdt = fs::symlink_metadata(OsStr::from_bytes(&rdnew[60..]));
1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.