How can I convert a &str
into a corresponding *const libc::c_char
?
Well, it depends. How do you define a conversion from a string to a single byte (which char
is in C)? Do you want the first character? The last one? What if the string is empty?
Sorry, it should be *const libc::c_char
, not libc::c_char
.
You create a CString
(yes, that requires another allocation because you need to put the \0
somewhere), then .as_ptr()
that (and make sure to keep the CString
alive).
How could I keep the CString
alive and also don't cause memory leakage?
It depends on how its used. The pointer will be valid until the destructor of the CString
runs. If you just need it for the duration of a function call, then its easy - just run the destructor after the function call. But if the raw pointer needs to be valid for longer, then you need to put it somewhere that it survives for long enough.
or "leak" it using into_raw
and later reclaim it using from_raw
to then drop it (and thus deallocate the memory). It's up to you then to guarantee this is done safely.
Exactly as you would do it with any other type: keep it as long as you need it. When dropped, it will free the backing buffer, just like any other well-behaved type.
Also, as a reminder: in Rust, strings are always UTF-8. In C, they aren't. How you get from UTF-8 to whatever C expects is both platform- and usage-dependent and, in some cases, impossible.
Note that CString
does not do this conversion for you (because it can't).
In C, a char *
doesn't impose any restrictions on the encoding of the string. It's literally just a pointer to an array of (signed or unsigned) bytes. There's no need for any conversion here.
I think that's a bit besides the point here.
I think @DanielKeep wanted to empathise that you can't always rely on the C side of ffi to be able to decode a Rust string correctly.
A char *
doesn't, but if you end up using any of the "C string" family of functions, then there is at least one encoding assumption used: your string can't have an interior NUL bytes. Indeed, CString::new
checks this assumption for you. This program panics:
use std::ffi::CString;
fn main() {
let s = "foo\x00bar";
CString::new(s).unwrap();
}
But this is a narrow response to a narrow comment. The better response, IMO, is to not be so narrow. What you say is true about char *
, but what @DanielKeep likely meant is, "in a POSIX environment." In which case, locale matters and complicates things quite a bit.
In other words, you really want to know how your C strings are going to be used before handing them off to a C library. Your UTF-8 encoded &str
might wind up doing the right thing in a lot of cases (sans the interior NUL bytes, but that's checked for you) since en_US.UTF-8
is such a common locale. But UTF-8 might not be used at all, in which case the best you can hope for is mojibake.
I was actually thinking primarily about Windows. Many years ago, I tried making a string ffi crate to end all string ffi crates (I gave up because it kept crashing the compiler). One of the things I discovered is that, on Windows, MSVCRT and Win32 can be using different 8-bit encodings simultaneously, and there's no reliable/standard/supported way to convert between the MSVCRT one and anything else.
And that's not even touching on libraries using their own encodings for strings.
I've just seen too many examples of "how to convert strings between Rust and C" that completely neglects this aspect, despite being by far the hardest bit to get right.
Side point, but did this also include using mbstowcs() to go through UTF-16 (-ish) and friends? That's surprising.
Hey, guys, in my use case:
extern crate libc;
use libc::c_char;
use std::ffi::CString;
fn do_something() {
let rust_str = "Hello, World!";
let c_string = CString::new(rust_str).unwrap(); // Converts &str to CString
let c_char_ptr: *const c_char = c_string.as_ptr(); // Converts CString to *const c_char
// Ensure c_string does not get dropped before usage of c_char_ptr?
std::mem::forget(c_string);
let some_result = unsafe { some_c_ffi_func(c_char_ptr) };
// Do something with some_result
}
Do I have to call std::mem::forget()
?
// Ensure c_string does not get dropped before usage of c_char_ptr?
std::mem::forget(c_string);
You probably would better use CString::into_raw
- it captures intent better and explicitly allows you to do CString::from_raw
to deallocate it later.
To be honest, this was long enough ago and enough of a mess that I don't remember. Now that you mention it, I remember at least trying that, but I don't recall if it worked. I vaguely recall falling down some rabbit holes regarding the separation between MSVCRT and Win32, as well as what specific versions of C actually guaranteed (complicated by a pile of "common wisdom" that turned out to be unfounded)... which was around the time I realised the compiler couldn't handle what I wanted to do anyway and decided to go do something more relaxing like beating my head against a brick wall.
... come to think of it, the problem might have been that nothing guaranteed what the wide encoding was. I remember reading part of one of the C standards to figure out if one of the new string encoding types was actually, really, honest-to-god guaranteed to be UTF only to find that it didn't actually guarantee anything and all the posts were just assuming that's what they did because why on earth would anyone in this day and age introduce new string encodings and not guarantee it was UTF but of course they didn't because there's presumably some tiny chip vendor with their own homebrew toolchain that didn't want to have to change anything and why would anyone want interoperable software that's just silly and I...
*deep breaths*
I'm not doing this. If you'll excuse me, I'm going to go beat my head against a wall until I feel better...
To really hammer this home, it actually isn't possible to know whether you need to do std::mem::forget()
(or the into_raw
/from_raw
dance) based on the information you've provided. You need to look at the actual C APIs you're using and determine the lifetime required for the C string that you're passing into some_c_ffi_func
. If the C library will hold on to that C string after the function returns in some fashion, then you likely need to call CString::into_raw
and then CString::from_raw
when it's safe to free it (if any such point exists). We cannot tell you in general what to do because it depends on the C library you're using.
My guess is that for the specific pattern you have here, some_c_ffi_func
will use the C string given but not hang on to it after the function returns. In which case, you don't need std::mem::forget
or CString::into_raw
. Just pass c_char_ptr
as is since it won't get automatically dropped until the enclosing scope ends. But again, I have to stress, you have not provided enough information to determine whether this is correct or not because you haven't provided a real code sample, nor have you pointed to a real C API that you're trying to use.
Oh, man, that just reminded me of yet another way this can go wrong: if the C function intends to (at some point) deallocate the string itself, you need to work out which allocator it's using, then manually allocate the C string using that, copy the bytes across, and return that.
If the C library expects an uninitialized pointer to the string (e.g. when working with "error status strings") you can pass in a MaybeUninit CString.