Easy way to pass a Path to C

I am trying to pass a Path to C. Right now, the easiest way I have found is to convert the Path to a str and then convert it into a CString. This seems like a bad solution because it can fail.

For more context I am trying to interoperate with the PDFium library. In the header file, it says that I need to pass a FPDF_STRING into FPDF_LoadDocument. Here is the definition of FPDF_STRING:

// For Windows programmers: In most cases it's OK to treat FPDF_WIDESTRING as a
// Windows unicode string, however, special care needs to be taken if you
// expect to process Unicode larger than 0xffff.
//
// For Linux/Unix programmers: most compiler/library environments use 4 bytes
// for a Unicode character, and you have to convert between FPDF_WIDESTRING and
// system wide string by yourself.
typedef const char* FPDF_STRING;

This seems to be very close to an OsStr so I feel like I should be able to go through OsString instead. Which could allow for non utf8 characters and the like.

Has anyone have to do something like this before? Does any one have any advice for me?

Does

1 Like

The code suggests that, but think about it: a C string can contain pretty much anything, including valid UTF8. This means that the conversions can't fail in practice.

And indeed it's a wordy solution, but pretty much the correct one at this point in time.

The traits os::unix::ffi::OsStrExt and OsStringExt have methods for converting to &[u8] or Vec<u8>, which you can then use to convert to CString. This avoids going through str or String, so it works even with paths that are not valid UTF-8.

Unfortunately, this won't work on Windows. If you need to convert an arbitrary Path to a C string in cross-platform code, you'll need to go through str, and reject non-Unicode paths. More about this problem here:

4 Likes

An OsString (which underlies Path) can contain non-unicode, and thus the conversion to String can fail. Additionally, a CString cannot contain internal NULs, but a String can, so that conversion can fail as well.

3 Likes

I assumed Path and PathBuf are correct by construction. If not then that is a problem that needs solving.

Interesting. Exactly what happens when trying to print a String that contains internal NULs? Is the NUL represented somehow in the print output? Or is the output truncated?

PathBuf (or OsString generally) can contain non-unicode and still be correct, it's just a reflection of what can exist on the filesystem / in your ENV / in the OS generally. If anything's broke that needs solving, it's code that assumes otherwise.

% find foo* -print0 | od -t x1 -a -
0000000  66  6f  6f  2d  ff  62  61  72  00
          f   o   o   - del   b   a   r nul

Or if you mean, filesystems and operating systems should become strictly unicode... perhaps that day will come, but we're certainly not there yet (and it will entail a performance penalty).

It outputs the NUL literally, no truncation. (As far as I'm aware, Display means "UTF8 output" and not "non-control characters" or anything.)

3 Likes

That'd be really nice, and more contemporary.

What performance penalty are you referring to? The inability to do O(1) indexing, requiring O(n) instead?

Things like that, plus the linear verification hit you take on any trusted/non-trusted boundary. Which is almost everything (bits on a wire, input from a keyboard, pretty much any C api, etc).

Here's a recent article about the necessity to deal with mixed encodings including but beyond just the current-filesystem/os considerations. It'll be with us for decades more.

1 Like

Does this seem like a reasonable solution?

use std::ffi::CString;
#[cfg(unix)]
use std::os::unix::ffi::OsStrExt;
use std::path::Path;

#[cfg(not(unix))]
fn cstr(path: &Path) -> Result<CString, PdfiumError> {
    let path = path.to_str().ok_or(PdfiumError::BadFile)?;
    CString::new(path).map_err(|_| PdfiumError::BadFile)
}

#[cfg(unix)]
fn cstr(path: &Path) -> Result<CString, PdfiumError> {
    CString::new(path.as_os_str().as_bytes()).map_err(|_| PdfiumError::BadFile)
}
2 Likes