If I'm passing a path to an FFI function that's going to open a file, what's the right way to do that?
CString::new() seems to want a
Vec<u8>, but it's difficult to transform a
PathBuf into a
Vec<u8> (unless I missed something). But if I transform it to a string first, that seems wrong, because that could change the byte layout.
Do I use
OsString somehow? It seems difficult to change that into bytes, as well.
There are OS-specific APIs in
On Windows, where the file APIs predominantly use "wide strings" (a.k.a. potentially ill-formed UTF-16) and
OsString is just a UTF-8-like encoding of these, there are UTF-16 conversion functions.
On Unix, where filepaths consist of arbitrary byte strings (and
OsString is just a simple wrapper around
Vec<u8>), there are functions that just expose the bytes.
As for other possibilities:
- If the ffi routine you are calling expects UTF-8 specifically, convert to String/str (which may fail) and then call
- If the ffi routine you are calling only supports ascii, do the String/str+
as_bytes dance and then you can call
is_ascii() to check before using it.
...oops, it just occured to me you were asking about PathBuf and not OsString. To my knowledge, however, PathBuf is just a strongly typed wrapper around OsString, so the conversions between them should be lossless and zero-cost.
I see that there is: https://doc.rust-lang.org/std/os/windows/ffi/trait.OsStrExt.html on windows that supports
encode_wide (which gives a
Vec<u16>). But that still doesn't help because I need a CString (to pass to the FFI function expecting a
char *), and CString wants a
It seems like there should basically be one way to go from a PathBuf to a CString without resorting to platform-specific code, right?
EDIT: I guess I should clarify what API I am trying to call. In this case, it's
clang_parseTranslationUnit. I assume it just passes the
const char *source_filename argument straight through to an open call somewhere. So on windows, should I be passing a UTF16 string or a UTF8 string, or what? And how would I get that out of a
PathBuf and through a
CString (which expects
A C string requires a null terminator, in rust strings don't. That may already require you to allocate memory for most strings.
char* isn't a wide string,
wchar* strings are.
As you said earlier, you can use CString to copy the string and add the extra null terminator that most C programs expect. You don't need to allocate the Vec yourself, just pass in a regular
let alocced_with_null = CString::new("file.txt");
let ptr_to_clang = alocced_with_null.as_ptr();
Okay, I'm looking at the libclang documentation, and am having trouble finding information on what encoding they expect strings to be in. That said, one can be sure that UTF-16 is definitely NOT the right choice here, because
*char implies that the code unit size is 1; if they wanted UTF-16, they would take a
*const u16 in rust terms.
UTF-8 is ubiquitous enough that it's probably your safest bet. So I would call:
into_os_string().into_string() to produce
Result<String, OsString>. (only very ill-formed names will fail; those with unpaired surrogates)
CString::new which will accept a
String since that implements
Note that just because you see
char* in APIs doesn't necessarily mean you need to (or even should) use CString/CStr. Sometimes these are the wrong type, e.g. when interior NULLs must be allowed, or when the called function might mutate the string and write interior NULLs. Sometimes I just use
&[u8]/Vec<u8> and manually deal with the trailing NULL myself.
Thank you both! I will just convert it to a UTF8 string, then create a
CString out of that. I was worried that on windows that would somehow do the wrong thing, but I think I was just confused and that would only be a problem for
wchar* as you say.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.