Help with ffi and cstring

I'm new to Rust and would like some help with a couple of things.
I tried to create a message box in windows, using winapi, and was uncertain about some things.
Here's my example code:

    let text = CString::new("A message in the window").unwrap();
    let text_ptr = text.into_raw();
    let title = CString::new("This is a title").unwrap();
    let title_ptr = title.into_raw();

        unsafe {
            MessageBoxA(std::ptr::null_mut(), text_ptr, title_ptr, MB_OK);
            CString::from_raw(text_ptr);
            CString::from_raw(title_ptr);
        }

Now as I understand it when i call let text = CString::new on my string slice, it gives me back an ansi-string (kind of).
into_raw will then give me back a raw pointer to the cstring (text), the text variable still holds the ownership tho.
Now as I understod from the documentation, I need to call CString::from_raw on my raw pointer to be able to free the memory. Why can't I just let the pointers go out of scope and clean themselfs up?

I might have got all of this wrong, so please correct me.
Furthermore I wonder about the ansi part. I use MessageBoxA so I can use my CString. If I want to be able to use windows wide characters (utf 16), how would I go about creating those strings? I havn't found any good resource on the topic.
Hope all of this makes sense.

Oh man, no. It does not. What CString gives you is a NUL-terminated UTF-8 string. What Windows wants is a NUL-terminated string encoded in the current Ansi codepage. There is no function to go from UTF-8 to the current Ansi codepage. For ASCII strings, this won't matter, but the moment your program tries to feed anything outside of the ASCII range to MessageBoxA, it will display gibberish.

You should basically just never use the *A function variants. You should always use the *W variants that take zero-terminated UTF-16 strings. I believe there are convenience functions in the wio crate for this.

Because raw pointers are just that: raw pointers. The whole point of raw pointers is that the language does nothing to manage them whatsoever. That's what makes them raw.

Anyway, you should only use into_raw and the like when you are giving away ownership of the thing. In this case, you're just lending access to MessageBoxA, so what you want is as_ptr.

Just to reiterate: check the wio crate.


†: Which is, of course, totally different and unrelated to the various string encodings other C libraries may or may not use. Or maybe not. Who knows? Isn't passing strings to C libraries fun?!

‡: Technically, it's not UTF-16; it's raw 16-bit words that may or may not be valid UTF-16. There's a very good reason why Rust has totally different string types when talking to the OS.

2 Likes

A simple way to use MessageBoxW would be something like.

fn to_wide(s: &str) -> Vec<u16> {
    s.encode_utf16() // Make a UTF-16 iterator
     .chain(Some(0).into_iter()) // Append a null
     .collect() // Collect the iterator into a vector
}

fn main() {
    let title = to_wide("This is a title");
    let message = to_wide("A message in the window");
    
    unsafe {
        MessageBoxW(std::ptr::null_mut(), message.as_ptr(), title.as_ptr(), MB_OK);
    }
}

The to_wide function turns a Rust string into a null terminated UTF-16 vector. Then you use as_ptr() to give a pointer to MessageBoxW.

All WinApis use proper UTF-16... except for filesystem paths. I'd guess this is because NTFS isn't UTF-16 aware. Incidentally, this was the motivation for creating WTF-8.

Thanks @DanielKeep for the detailed answer. I'm not used to working with string encodings ans such.
And thanks @chrisd for the clean example of doing it with the *W variant.
I'll play around with this some more, and you can expect more threads about ffi and such as I continue my journey to make things with the windows api

1 Like

A lot more of Windows API than just filesystem APIs will happily accept, store, and return strings that contain lone surrogates in them. In fact, feel free to provide examples of Windows API functions that will actually validate UTF-16, I'd really be interested to know.

2 Likes

Yes you're absolutely right. I had second thoughts about that after I typed it so asked a question stackoverflow. Unfortunately I forgot to report back here. In my defence, the WIndows documentation does it's absolute best to obscure this fact.

As for an example of a function that will actually validate UTF-16, how about MultiByteToWideChar? :wink:

I really like that to_wide function! :heart: One more note, you can leave out the .into_iter(), because Option does implement the IntoIterator trait.

Or you could use std::iter::once(0) as an alternateive, but I think Some(0) is shorter and as readable as the other solution.

1 Like

Thanks, those are both good points. I feel std::iter::once probably does do a better job of explaining the intent of that line. So that gives us:

fn to_wide(s: &str) -> Vec<u16> {
    s.encode_utf16() // Make a UTF-16 iterator
     .chain(std::iter::once(0)) // Append a null
     .collect() // Collect the iterator into a vector
}

Obviously a use statement for std::iter could be used if people prefer to be more succinct.

Yes, std::iter::once is provided for this purpose so it's more clear, explicit, and maybe slightly faster. But Some is already in prelude, so using it for this purpose is way, way, way really more convenient. This hurts me whenever I type .chain(Some(value))