If I'm passing a path to an FFI function that's going to open a file, what's the right way to do that?
CString::new() seems to want a Vec<u8>, but it's difficult to transform a PathBuf into a Vec<u8> (unless I missed something). But if I transform it to a string first, that seems wrong, because that could change the byte layout.
Do I use OsString somehow? It seems difficult to change that into bytes, as well.
On Unix, where filepaths consist of arbitrary byte strings (and OsString is just a simple wrapper around Vec<u8>), there are functions that just expose the bytes.
As for other possibilities:
If the ffi routine you are calling expects UTF-8 specifically, convert to String/str (which may fail) and then call .as_bytes().
If the ffi routine you are calling only supports ascii, do the String/str+as_bytes dance and then you can call is_ascii() to check before using it.
...oops, it just occured to me you were asking about PathBuf and not OsString. To my knowledge, however, PathBuf is just a strongly typed wrapper around OsString, so the conversions between them should be lossless and zero-cost.
I see that there is: https://doc.rust-lang.org/std/os/windows/ffi/trait.OsStrExt.html on windows that supports encode_wide (which gives a Vec<u16>). But that still doesn't help because I need a CString (to pass to the FFI function expecting a char *), and CString wants a Vec<u8>.
It seems like there should basically be one way to go from a PathBuf to a CString without resorting to platform-specific code, right?
EDIT: I guess I should clarify what API I am trying to call. In this case, it's clang_parseTranslationUnit[1]. I assume it just passes the const char *source_filename argument straight through to an open call somewhere. So on windows, should I be passing a UTF16 string or a UTF8 string, or what? And how would I get that out of a PathBuf and through a CString (which expects Vec<u8>?
A C string requires a null terminator, in rust strings don't. That may already require you to allocate memory for most strings. char* isn't a wide string, wchar* strings are.
As you said earlier, you can use CString to copy the string and add the extra null terminator that most C programs expect. You don't need to allocate the Vec yourself, just pass in a regular String or str.
let alocced_with_null = CString::new("file.txt");
let ptr_to_clang = alocced_with_null.as_ptr();
Okay, I'm looking at the libclang documentation, and am having trouble finding information on what encoding they expect strings to be in. That said, one can be sure that UTF-16 is definitely NOT the right choice here, because *char implies that the code unit size is 1; if they wanted UTF-16, they would take a WCHAR*, or *const u16 in rust terms.
UTF-8 is ubiquitous enough that it's probably your safest bet. So I would call:
into_os_string().into_string() to produce Result<String, OsString>. (only very ill-formed names will fail; those with unpaired surrogates)
Then CString::new which will accept a String since that implements Into<Vec<u8>>.
Note that just because you see char* in APIs doesn't necessarily mean you need to (or even should) use CString/CStr. Sometimes these are the wrong type, e.g. when interior NULLs must be allowed, or when the called function might mutate the string and write interior NULLs. Sometimes I just use &[u8]/Vec<u8> and manually deal with the trailing NULL myself.
Thank you both! I will just convert it to a UTF8 string, then create a CString out of that. I was worried that on windows that would somehow do the wrong thing, but I think I was just confused and that would only be a problem for wchar* as you say.