Rust how to convert `&str` to `*const libc::c_char`?

foobar · August 10, 2023, 12:03pm

How can I convert a &str into a corresponding *const libc::c_char?

H2CO3 · August 10, 2023, 12:14pm

Well, it depends. How do you define a conversion from a string to a single byte (which char is in C)? Do you want the first character? The last one? What if the string is empty?

foobar · August 10, 2023, 12:20pm

Sorry, it should be *const libc::c_char, not libc::c_char.

jer · August 10, 2023, 12:21pm

You create a CString (yes, that requires another allocation because you need to put the \0 somewhere), then .as_ptr() that (and make sure to keep the CString alive).

foobar · August 10, 2023, 12:26pm

How could I keep the CString alive and also don't cause memory leakage?

alice · August 10, 2023, 12:39pm

It depends on how its used. The pointer will be valid until the destructor of the CString runs. If you just need it for the duration of a function call, then its easy - just run the destructor after the function call. But if the raw pointer needs to be valid for longer, then you need to put it somewhere that it survives for long enough.

jer · August 10, 2023, 12:43pm

or "leak" it using into_raw and later reclaim it using from_raw to then drop it (and thus deallocate the memory). It's up to you then to guarantee this is done safely.

H2CO3 · August 10, 2023, 12:44pm

Exactly as you would do it with any other type: keep it as long as you need it. When dropped, it will free the backing buffer, just like any other well-behaved type.

DanielKeep · August 10, 2023, 1:11pm

Also, as a reminder: in Rust, strings are always UTF-8. In C, they aren't. How you get from UTF-8 to whatever C expects is both platform- and usage-dependent and, in some cases, impossible.

Note that CString does not do this conversion for you (because it can't).

H2CO3 · August 10, 2023, 1:41pm

In C, a char * doesn't impose any restrictions on the encoding of the string. It's literally just a pointer to an array of (signed or unsigned) bytes. There's no need for any conversion here.

HJVT · August 10, 2023, 3:05pm

I think that's a bit besides the point here.
I think @DanielKeep wanted to empathise that you can't always rely on the C side of ffi to be able to decode a Rust string correctly.

BurntSushi · August 10, 2023, 3:26pm

A char * doesn't, but if you end up using any of the "C string" family of functions, then there is at least one encoding assumption used: your string can't have an interior NUL bytes. Indeed, CString::new checks this assumption for you. This program panics:

use std::ffi::CString;

fn main() {
    let s = "foo\x00bar";
    CString::new(s).unwrap();
}

But this is a narrow response to a narrow comment. The better response, IMO, is to not be so narrow. What you say is true about char *, but what @DanielKeep likely meant is, "in a POSIX environment." In which case, locale matters and complicates things quite a bit.

In other words, you really want to know how your C strings are going to be used before handing them off to a C library. Your UTF-8 encoded &str might wind up doing the right thing in a lot of cases (sans the interior NUL bytes, but that's checked for you) since en_US.UTF-8 is such a common locale. But UTF-8 might not be used at all, in which case the best you can hope for is mojibake.

DanielKeep · August 10, 2023, 3:34pm

I was actually thinking primarily about Windows. Many years ago, I tried making a string ffi crate to end all string ffi crates (I gave up because it kept crashing the compiler). One of the things I discovered is that, on Windows, MSVCRT and Win32 can be using different 8-bit encodings simultaneously, and there's no reliable/standard/supported way to convert between the MSVCRT one and anything else.

And that's not even touching on libraries using their own encodings for strings.

I've just seen too many examples of "how to convert strings between Rust and C" that completely neglects this aspect, despite being by far the hardest bit to get right.

simonbuchan · August 10, 2023, 11:32pm

Side point, but did this also include using mbstowcs() to go through UTF-16 (-ish) and friends? That's surprising.

foobar · August 11, 2023, 1:10am

Hey, guys, in my use case:

extern crate libc;
use libc::c_char;
use std::ffi::CString;

fn do_something() {
    let rust_str = "Hello, World!";
    let c_string = CString::new(rust_str).unwrap(); // Converts &str to CString
    let c_char_ptr: *const c_char = c_string.as_ptr(); // Converts CString to *const c_char

    // Ensure c_string does not get dropped before usage of c_char_ptr?
    std::mem::forget(c_string);

    let some_result = unsafe { some_c_ffi_func(c_char_ptr) };
    // Do something with some_result
}

Do I have to call std::mem::forget()?

// Ensure c_string does not get dropped before usage of c_char_ptr?
std::mem::forget(c_string);

Cerber-Ursi · August 11, 2023, 2:01am

You probably would better use CString::into_raw - it captures intent better and explicitly allows you to do CString::from_raw to deallocate it later.

DanielKeep · August 11, 2023, 2:21am

To be honest, this was long enough ago and enough of a mess that I don't remember. Now that you mention it, I remember at least trying that, but I don't recall if it worked. I vaguely recall falling down some rabbit holes regarding the separation between MSVCRT and Win32, as well as what specific versions of C actually guaranteed (complicated by a pile of "common wisdom" that turned out to be unfounded)... which was around the time I realised the compiler couldn't handle what I wanted to do anyway and decided to go do something more relaxing like beating my head against a brick wall.

... come to think of it, the problem might have been that nothing guaranteed what the wide encoding was. I remember reading part of one of the C standards to figure out if one of the new string encoding types was actually, really, honest-to-god guaranteed to be UTF only to find that it didn't actually guarantee anything and all the posts were just assuming that's what they did because why on earth would anyone in this day and age introduce new string encodings and not guarantee it was UTF but of course they didn't because there's presumably some tiny chip vendor with their own homebrew toolchain that didn't want to have to change anything and why would anyone want interoperable software that's just silly and I...

*deep breaths*

I'm not doing this. If you'll excuse me, I'm going to go beat my head against a wall until I feel better...

BurntSushi · August 11, 2023, 12:00pm

To really hammer this home, it actually isn't possible to know whether you need to do std::mem::forget() (or the into_raw/from_raw dance) based on the information you've provided. You need to look at the actual C APIs you're using and determine the lifetime required for the C string that you're passing into some_c_ffi_func. If the C library will hold on to that C string after the function returns in some fashion, then you likely need to call CString::into_raw and then CString::from_raw when it's safe to free it (if any such point exists). We cannot tell you in general what to do because it depends on the C library you're using.

My guess is that for the specific pattern you have here, some_c_ffi_func will use the C string given but not hang on to it after the function returns. In which case, you don't need std::mem::forget or CString::into_raw. Just pass c_char_ptr as is since it won't get automatically dropped until the enclosing scope ends. But again, I have to stress, you have not provided enough information to determine whether this is correct or not because you haven't provided a real code sample, nor have you pointed to a real C API that you're trying to use.

DanielKeep · August 11, 2023, 1:59pm

Oh, man, that just reminded me of yet another way this can go wrong: if the C function intends to (at some point) deallocate the string itself, you need to work out which allocator it's using, then manually allocate the C string using that, copy the bytes across, and return that.

kehrazy · August 14, 2023, 6:32pm

If the C library expects an uninitialized pointer to the string (e.g. when working with "error status strings") you can pass in a MaybeUninit CString.

Topic		Replies	Views
Converting *const c_char to &str help	7	12000	January 12, 2023
Converting &str to *const c_char help	3	19068	January 12, 2023
Rust string to *char	5	4619	January 12, 2023
[FFI / bindgen] getting const char * from c library, how do i use it correctly?	3	2580	January 12, 2023
How to convert a non-zero-terminated C string to Rust &str or String help	12	2738	January 12, 2023

Rust how to convert `&str` to `*const libc::c_char`?

Related topics