Tidy pattern to work with LPSTR (mutable char array)


#1

I need to call native C function in Windows:

// typedef char* LPSTR;
// typedef unsigned long DWORD;

long __stdcall some_func(LPSTR parameter, LPSTR errMsgBuffer, DWORD errBufferSize);

I want to:

  1. Use String type as first parameter. How to convert it to LPSTR?
  2. Create any continuous container for errMsgBuffer.
  3. Convert the result recorded to errMsgBuffer to a String instance.

CString appears constant and can’t be used in this case, doesn’t it?!

For the first complication I’ve tried to use String this way:

let parameter = path.into_bytes().into_boxed_slice().as_mut_ptr() as *mut i8;

For second i’ve tried to use this:

const BUF_SIZE: DWORD = 200;
let mut errMsgBuffer = [0 as i8; BUF_SIZE as usize];

But it seems not elegant (also I feel it has some overhead and doesn’t work right, native lib returns error code).
Is there a common pattern to send mutable null-terminated string as LPSTR and get it back as String?
Maybe some crates already solve similar problem?


Newbie to RUST, looking for some advice
#2

First of all, do you know what encoding some_func uses? If you know that it uses UTF-8, then you can continue on with confidence, if not, stop what you’re doing and look for a wide version! Especially if you’re invoking Windows API stuff, almost all of it does not use UTF-8 and actually uses UTF-16.

The next question is, why is some_func taking a *mut char instead of a *const char. Does it intend to mutate the string? Perhaps it wants to take ownership of it? It is extremely important that you understand what is going on, otherwise you could potentially do things wrong.

Now, if you are calling a function that takes a null terminated UTF-8 encoded *const char and it just needs to read it and does not take ownership of it, then you can do the following.

let mystring = "Hello world!";
let cstr = CString::new(mystring);
some_func(cstr.as_ptr());

Notice in particular that I assign the CString to a variable. If I did CString::new(...).as_ptr(), then the CString would be dropped as soon as that statement was over and you’d very likely be left with a dangling pointer.

If you want to provide a buffer to a function so that it may write a string into it.

let mut buf = [0; 0x200]; // Change size as needed.
some_func(buf.as_mut_ptr(), buf.len() as DWORD);
let len = buf.iter().take_while(|&&c| c != 0).count(); // This is only if the function doesn't tell you the length
let s = std::str::from_utf8(&buf[..len]); // Can .to_owned() if you need a String

Now, if it turns out that you are using Windows API, so that what you should be doing is using the wide versions of functions, you can use these traits to make things simpler.

use std::ffi::{OsStr, OsString};
use std::os::windows::prelude::*;
pub trait ToWide {
    fn to_wide(&self) -> Vec<u16>;
    fn to_wide_null(&self) -> Vec<u16>;
}
impl<T> ToWide for T where T: AsRef<OsStr> {
    fn to_wide(&self) -> Vec<u16> {
        self.as_ref().encode_wide().collect()
    }
    fn to_wide_null(&self) -> Vec<u16> {
        self.as_ref().encode_wide().chain(Some(0)).collect()
    }
}
pub trait FromWide where Self: Sized {
    fn from_wide_null(wide: &[u16]) -> Self;
}
impl FromWide for OsString {
    fn from_wide_null(wide: &[u16]) -> OsString {
        let len = wide.iter().take_while(|&&c| c != 0).count();
        OsString::from_wide(&wide[..len])
    }
}

Then you can just do

let win = "foo".to_wide_null();
let mut wout = [0; 0x200];
some_func(widein.as_ptr(), wideout.as_mut_ptr(), wideout.len() as DWORD);
let s = OsString::from_wide_null(&wout);

#3

Thank you for useful examples!

It seems I’ve understand the difference: Rust doesn’t hide CPU routines from developer, as example zero-terminated strings has a cost and I have to use it explicit. In this case C has more implicit magic in standard library.

Native functions I need aren’t WinAPI. I’ve comprehend your advices about owning, native functions that I need won’t keep the pointer after calling. I can declare parameter as const, it never changes. Second errMsgBuffer argument was filled by 8-bit encoded string (no wide version of func).
Your examples suit for my case, becasue I can use encoding crate for decoding data to Unicode. Is it right idea?
Which methods do you use to bootstrap String from 8-bit encoded array?


#4

8-bit encoding is a very vague term. There are numerous encodings which fit into 8-bit units, including UTF-8. You need to know what the actual encoding is. If it is just going to be stored as an opaque string of bytes and later returned to you, and the library never attempts to interpret it, then you don’t need to do any encoding or decoding and can just assume UTF-8 which is what Rust uses. If it is going to interpret it as a filename or a path, then you have some very serious problems, because if the library does not specifically ensure that it is working with unicode at all times, things will break on Windows.


#5

Ok, I have to read error messages which will be decoded from Windows-1251, losses aren’t critical. Thank you for detailed explanation!


#6

You probably know this but in a console application the proper 8-bit encoding would usually be 866 not 1251.


#7

You are right, it gives some problems for debbuging. I tend to use logging, since encoding hell come. I hope somebody rewrite Windows with Rust in the future, at least cmd :smile:


#8

Just FYI, In many parts of Europe, 1251 is the default encoding for Windows systems.


#9

The default ANSI codepage yes. Since most Rust applications probably are console ones, you may also have to take into account the OEM codepage.