Using PathBuf correctly with FFI

If I'm passing a path to an FFI function that's going to open a file, what's the right way to do that?

CString::new() seems to want a Vec<u8>, but it's difficult to transform a PathBuf into a Vec<u8> (unless I missed something). But if I transform it to a string first, that seems wrong, because that could change the byte layout.

Do I use OsString somehow? It seems difficult to change that into bytes, as well.

There are OS-specific APIs in std::os::*::ffi.

On Windows, where the file APIs predominantly use "wide strings" (a.k.a. potentially ill-formed UTF-16) and OsString is just a UTF-8-like encoding of these, there are UTF-16 conversion functions.

On Unix, where filepaths consist of arbitrary byte strings (and OsString is just a simple wrapper around Vec<u8>), there are functions that just expose the bytes.

As for other possibilities:

  • If the ffi routine you are calling expects UTF-8 specifically, convert to String/str (which may fail) and then call .as_bytes().
  • If the ffi routine you are calling only supports ascii, do the String/str+as_bytes dance and then you can call is_ascii() to check before using it.

...oops, it just occured to me you were asking about PathBuf and not OsString. To my knowledge, however, PathBuf is just a strongly typed wrapper around OsString, so the conversions between them should be lossless and zero-cost.

I see that there is: on windows that supports encode_wide (which gives a Vec<u16>). But that still doesn't help because I need a CString (to pass to the FFI function expecting a char *), and CString wants a Vec<u8>.

It seems like there should basically be one way to go from a PathBuf to a CString without resorting to platform-specific code, right?

EDIT: I guess I should clarify what API I am trying to call. In this case, it's clang_parseTranslationUnit[1]. I assume it just passes the const char *source_filename argument straight through to an open call somewhere. So on windows, should I be passing a UTF16 string or a UTF8 string, or what? And how would I get that out of a PathBuf and through a CString (which expects Vec<u8>?


A C string requires a null terminator, in rust strings don't. That may already require you to allocate memory for most strings.
char* isn't a wide string, wchar* strings are.

As you said earlier, you can use CString to copy the string and add the extra null terminator that most C programs expect. You don't need to allocate the Vec yourself, just pass in a regular String or str.

let alocced_with_null = CString::new("file.txt");
let ptr_to_clang = alocced_with_null.as_ptr();

Okay, I'm looking at the libclang documentation, and am having trouble finding information on what encoding they expect strings to be in. That said, one can be sure that UTF-16 is definitely NOT the right choice here, because *char implies that the code unit size is 1; if they wanted UTF-16, they would take a WCHAR*, or *const u16 in rust terms.

UTF-8 is ubiquitous enough that it's probably your safest bet. So I would call:

  • into_os_string().into_string() to produce Result<String, OsString>. (only very ill-formed names will fail; those with unpaired surrogates)
  • Then CString::new which will accept a String since that implements Into<Vec<u8>>.

Note that just because you see char* in APIs doesn't necessarily mean you need to (or even should) use CString/CStr. Sometimes these are the wrong type, e.g. when interior NULLs must be allowed, or when the called function might mutate the string and write interior NULLs. Sometimes I just use &[u8]/Vec<u8> and manually deal with the trailing NULL myself.

Thank you both! I will just convert it to a UTF8 string, then create a CString out of that. I was worried that on windows that would somehow do the wrong thing, but I think I was just confused and that would only be a problem for wchar* as you say.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.