Using PathBuf correctly with FFI

jeffdavis · May 13, 2020, 12:06am

If I'm passing a path to an FFI function that's going to open a file, what's the right way to do that?

CString::new() seems to want a Vec<u8>, but it's difficult to transform a PathBuf into a Vec<u8> (unless I missed something). But if I transform it to a string first, that seems wrong, because that could change the byte layout.

Do I use OsString somehow? It seems difficult to change that into bytes, as well.

ExpHP · May 13, 2020, 12:12am

There are OS-specific APIs in std::os::*::ffi.

On Windows, where the file APIs predominantly use "wide strings" (a.k.a. potentially ill-formed UTF-16) and OsString is just a UTF-8-like encoding of these, there are UTF-16 conversion functions.

On Unix, where filepaths consist of arbitrary byte strings (and OsString is just a simple wrapper around Vec<u8>), there are functions that just expose the bytes.

As for other possibilities:

If the ffi routine you are calling expects UTF-8 specifically, convert to String/str (which may fail) and then call .as_bytes().
If the ffi routine you are calling only supports ascii, do the String/str+as_bytes dance and then you can call is_ascii() to check before using it.

...oops, it just occured to me you were asking about PathBuf and not OsString. To my knowledge, however, PathBuf is just a strongly typed wrapper around OsString, so the conversions between them should be lossless and zero-cost.

jeffdavis · May 13, 2020, 12:23am

I see that there is: OsStrExt in std::os::windows::ffi - Rust on windows that supports encode_wide (which gives a Vec<u16>). But that still doesn't help because I need a CString (to pass to the FFI function expecting a char *), and CString wants a Vec<u8>.

It seems like there should basically be one way to go from a PathBuf to a CString without resorting to platform-specific code, right?

EDIT: I guess I should clarify what API I am trying to call. In this case, it's clang_parseTranslationUnit[1]. I assume it just passes the const char *source_filename argument straight through to an open call somewhere. So on windows, should I be passing a UTF16 string or a UTF8 string, or what? And how would I get that out of a PathBuf and through a CString (which expects Vec<u8>?

[1] clang: Translation unit manipulation

naim · May 13, 2020, 12:38am

A C string requires a null terminator, in rust strings don't. That may already require you to allocate memory for most strings.
char* isn't a wide string, wchar* strings are.

As you said earlier, you can use CString to copy the string and add the extra null terminator that most C programs expect. You don't need to allocate the Vec yourself, just pass in a regular String or str.

let alocced_with_null = CString::new("file.txt");
let ptr_to_clang = alocced_with_null.as_ptr();

ExpHP · May 13, 2020, 12:39am

Okay, I'm looking at the libclang documentation, and am having trouble finding information on what encoding they expect strings to be in. That said, one can be sure that UTF-16 is definitely NOT the right choice here, because *char implies that the code unit size is 1; if they wanted UTF-16, they would take a WCHAR*, or *const u16 in rust terms.

UTF-8 is ubiquitous enough that it's probably your safest bet. So I would call:

into_os_string().into_string() to produce Result<String, OsString>. (only very ill-formed names will fail; those with unpaired surrogates)
Then CString::new which will accept a String since that implements Into<Vec<u8>>.

Note that just because you see char* in APIs doesn't necessarily mean you need to (or even should) use CString/CStr. Sometimes these are the wrong type, e.g. when interior NULLs must be allowed, or when the called function might mutate the string and write interior NULLs. Sometimes I just use &[u8]/Vec<u8> and manually deal with the trailing NULL myself.

jeffdavis · May 13, 2020, 12:43am

Thank you both! I will just convert it to a UTF8 string, then create a CString out of that. I was worried that on windows that would somehow do the wrong thing, but I think I was just confused and that would only be a problem for wchar* as you say.

system · August 11, 2020, 12:43am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Easy way to pass a Path to C	9	1608	February 21, 2021
Idiomatic way to convert non-UTF-8 vector slice to PathBuf help	6	2692	November 23, 2020
Does `OsString`'s `From<String>` allocate? help	14	1016	August 18, 2020
Long-winded ffi conversion on windows	9	1099	October 3, 2019
Does converting a String into a PathBuf allocate new buffer? help	6	3122	June 29, 2021

Using PathBuf correctly with FFI

Related topics