How to convert Rust String to wchar_t* in C++

I want to use crate base2048 for my C program so I write this

#[no_mangle]
pub extern "C" fn encode(bytes: *mut u8) -> *const c_char {
    unsafe {
        CString::new(base2048::encode(bytes.as_ref().unwrap().to_ne_bytes().as_ref()).as_str())
            .unwrap()
            .as_ref()
            .as_ptr()
    }
}

But then I realized a problem that It not support utf-8.But base2048 is based on utf-8
Is there any solution or any crate to solve it?

I don't see in what way it "doesn't support" UTF-8. The crate encodes to String and decodes from str, which are defined to be valid UTF-8.

By the way, in the code you posted, your return value is a dangling pointer, since the temporary CString you allocate is dropped at the end of fhe function.

1 Like

Let's also look at what you're doing with bytes.

CString::new(base2048::encode(
  bytes            // *mut u8
    .as_ref()      // Option<&u8>
    .unwrap()      // &u8
    .to_ne_bytes() // [u8; 1]
    .as_ref()      // &[u8] with length 1
).as_str())

I don't think this (encode exactly one byte) is what you meant to do.

Here's one possibility:

// # SAFETY
// This function must only be called with a valid, NUL-terminated C string with
// a length less than `isize::MAX`.
#[no_mangle]
pub extern "C" fn encode(bytes: *const c_char) -> *mut c_char {
    // SAFETY: Invariants must be upheld by caller
    let cstr = unsafe { CStr::from_ptr(bytes) };
    let based = base2048::encode(cstr.to_bytes());
    let cstring = CString::new(based).expect("base2048-encoded string contained NUL");
    cstring.into_raw()
}

/// # SAFETY
/// This function must only be called with a pointer obtained by calling the
/// `encode` function.  After it is called, the pointer must not be used any
/// longer.
#[no_mangle]
pub extern "C" fn free_encoded(raw: *mut c_char) {
    // SAFETY: Invariants must be upheld by caller; pointer was created from
    // `CString::into_raw`
    let _ = unsafe { CString::from_raw(raw) };
}

You'll have to free the allocated CString in Rust too, hence the second function. (Or leak it, but don't free it in C.)

1 Like

Panicking over a FFI boundary is UB I think, so it's probably better to return null than do the .expect there.

2 Likes

Good point.

/// # SAFETY
/// This function must only be called with a valid, NUL-terminated C string with
/// a length less than `isize::MAX` (or a null pointer).
///
/// The returned pointer may be null if
///  - You passed in a null pointer
///  - The base2048 encoding contained NUL bytes
#[no_mangle]
pub extern "C" fn encode(bytes: Option<NonNull<c_char>>) -> Option<NonNull<c_char>> {
    let bytes = bytes?.as_ptr();
    // SAFETY: Invariants must be upheld by caller
    let cstr = unsafe { CStr::from_ptr(bytes) };
    let based = base2048::encode(cstr.to_bytes());
    CString::new(based)
        .ok()
        .and_then(|c| NonNull::new(c.into_raw()))
}

/// # SAFETY
/// This function must only be called with a pointer obtained by calling the
/// `encode` function.  After it is called, the pointer must not be used any
/// longer.
#[no_mangle]
pub extern "C" fn free_encoded(raw: Option<NonNull<c_char>>) {
    // SAFETY: Invariants must be upheld by caller; pointer was created from
    // `CString::into_raw`
    if let Some(nn) = raw {
        let raw = nn.as_ptr();
        let _ = unsafe { CString::from_raw(raw) };
    }
}
1 Like

The other answers say how to use UTF-8 via char*. The question was about wchar_t*. You can't use CString with wchar_t*.

The problem with wchar_t is that it's a mess. It has a system- and locale-dependent encoding. You could probably assume it's UTF-16, and use widestring or Windows-only encode_wide, but if you do, it may be buggy if the system uses UCS-2, or completely break if the system uses non-Unicode Japanese or Chinese encodings, or if wchar_t was defined to be UTF-32.

So the best option is: never ever use wchar_t for anything ever. Forget it exists. Use char* with UTF-8.

4 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.