Passing string to c++-library to be filled with information, then getting it back

Hi,
I'm rather new to Rust and I'm struggling with a third party DLL-library. I'm trying to use a function that takes a pointer to an array of c_char that it fills with a string.

I have struggled with CString.into_raw(), that do give the correct type of pointer, but the value that gets filled in is just junk. If I do it like below (it feels like an ownership violation, but anyway) I get the expected value filled in my array (seen during debugging):

        let mut array: [std::os::raw::c_char;10] = [21;10];
        unsafe {
            hiwinInterface::MPI_GetVersion(array.as_mut_ptr());
        }

Unfortunately this only works as long as I don't touch the array afterwards. If I try to convert the array into a string the whole thing falls apart and I only get junk back (even before the line where I try to convert it).

What is the correct approach to passing a pointer to a string and letting an external library manipulate it so that I can use its value?

Kind regards,
Henrik

What exactly is the code that isn't working? I have the suspicion that you are making one or both of two mistakes:

  1. passing a pointer to the buffer of an empty CString (causing a buffer overflow); and/or
  2. returning a pointer inside your locally-allocated array, which causes a use-after-free.

Did you mean to initialize the buffer with an ASCII 21 / NAK? It seems like a null terminator would be a better choice.

2 Likes

It's hard to tell without more information, but I would perhaps start with something like

fn foo() -> Result<String, FromUtf8Error> {
    let mut bytes = vec![21_u8; 10];
    let ptr = bytes.as_mut_ptr() as *mut std::os::raw::c_char;
    
    // Safety: ???
    unsafe {
        hiwinInterface::MPI_GetVersion(ptr);
    }
    
    String::from_utf8(bytes)
}

Anticipated potential issues include

  • not overflowing the passed in buffer
  • truncating the String (trimming trailing 21_u8, trim after NUL?)
  • being filled with non-UTF8
    • Which the above code will catch, but if it's possible perhaps you need something besides String
1 Like

What is the signature for hiwinInterface::MPI_GetVersion()?

It's quite common for the return value to be an integer where negative values indicate an error and non-negative values indicate the number of characters that were written (which tells you how much of the buffer was initialized)... and of course, if you don't check for errors, Murphy's law dictates that MPI_GetVersion() will be failing every time and filling your buffer with junk.

1 Like

I tried your exact code and after the MPI_GetVersion()-call the bytes-vector is filled with:
[0xA8, 0xDE, 0xF6, 0x84, 0x88, 0x00, 0x15, 0x15, 0x15, 0x15]

The expected result is:
[0x32, 0x2E, 0x31, 0x2E, 0x37, 0x2E, 0x30, 0x00, 0x15, 0x15]
(or in ASCII the string "2.1.7.0\n")

So I do get something that looks like a null-terminated string, but with characters from the extended ASCII table!?

The MPI_GetVersion() is generated using Bindgen and looks like this:

extern "C" {
    #[link_name = "\u{1}?MPI_GetVersion@@YAXPEAD@Z"]
    pub fn MPI_GetVersion(pszVer: *mut ::std::os::raw::c_char);
}

The original header looks like this:
void __stdcall MPI_GetVersion( char *pszVer );

The library comes with an example program written in Visual C++. In that project they call the function like this:

	char szVer[10];
	MPI_GetVersion( szVer );
	Edit_Ver.SetWindowTextA( szVer );

And that gives the string "2.1.7.0".

Well, I didn't want to put 0's since that could mess up some string handling, so I put something that would be obvious if it was unchanged. So yes, it's on purpose, but it could be anything non zero.

Yes, some of the functions in the library return 0 for success and anything else for failure, but this very function don't return anything, it just modifies the string passed to it. See the signature in my answer to quinedot here: Passing string to c-library to be filled with information, then getting it back - #6 by HenrikK

I have tried stepping it forward adding piece by piece.

I start with just these lines:

unsafe {
            let mut array: [std::os::raw::c_char;10] = [33;10];
            hiwinInterface::MPI_GetVersion(array.as_mut_ptr());
}

It compiles fine and when I debug it I can see the expected value on the array in the variables-view.
I add the following line at the bottom of the unsafe clause:

            let array_u8: [u8;10] = std::mem::transmute(array.to_owned());

I can now see the correct value in the array_u8 array as well.
I add the below.

            let version = CString::new(array_u8);

Now, when I debug the code, I can see the array being filled with the correct string, then the array_u8 gets the string. Finally a new field shows up in the Variables-window, but it's not called "version" as I would have expected, but "alloc::ffi::c_str::CString::new<array$<u8,10> > returned: Err".

After some digging I found that the problem is that we got a 0 in the middle of the array!
According to documentation for CString there must be no null-terminations within the array when creating a CString. But what if there is?? I would like to get a string from the array that ends where the array is null-terminated. I have tried to parse the array in a couple of ways, but I havn't found anything that does the trick.

I tried to validate each value before adding it to a vector like this:

        let mut array_u8: [u8;10];
        unsafe {
            let mut array: [std::os::raw::c_char;10] = [33;10];
            hiwinInterface::MPI_GetVersion(array.as_mut_ptr());
            array_u8 = std::mem::transmute(array.to_owned().clone());

        }

        let mut vers = Vec::new();

        for char in array_u8 {
            if char == 0{
                break;
            }
            if char < 46 || char > 57 { // Sanity check, I expect nothing but numbers and points.
                return Err(HiwinError::DriverVersion);
            }
            vers.push(char);
        }

But then the returned value from MPI_GetVersion turns into junk again. The parsing quickly finds a character that is not a number or point, so the function returns with error.

The only thing I have found so far that doesn't destroy the returned value from MPI_GetVersion is to simply copy the values one by one like this:

        let mut array_u8: [u8;10];
        unsafe {
            let mut array: [std::os::raw::c_char;10] = [33;10];
            hiwinInterface::MPI_GetVersion(array.as_mut_ptr());
            array_u8 = std::mem::transmute(array.to_owned().clone());

        }
        let major = array_u8[0];
        let minor = array_u8[2];
        let patch = array_u8[4];
        let build = array_u8[6];

This works and I get the expected value, but it assumes a format on the string from the DLL, which is outside of my control.

Is there a way I can print the string, as is, without destroying it?

That's exactly what CString::from_vec_with_nul() is for.

By the way, you shouldn't be transmuting stuff, it's really unnecessary and the only thing it achieves is it makes your code more fragile under refactoring. I don't get what all these chained .to_owned().clone() calls are, either, that seems totally unnecessary too. The uninitialized array is also suspicious, that's very non-idiomatic in Rust, and situations when you need it should be exceedingly rare. (You don't need it here.)

This should work:

let mut raw: [u8; 10] = [0; 10];
unsafe {
    MPI_GetVersion(raw.as_mut_ptr().cast());
}
let valid = raw.split(|&b| b == 0).next().unwrap(); // handle error!
let string = CString::new(valid).unwrap(); // handle error!
2 Likes

The transmuting stuff was something I found when googling how to convert between array of i8 and array of u8. I suppose that's what you are doing with the .cast()?
Both .to_owned() and .clone() are needed or the value from the MPI_GetVersion becomes junk.

If I understand it correct the CString::from_vec_with_nul() only checks so that there is a null and only one null at the end of the vector. I have a null somewhere in the middle, then everything after that should be thrown away...

I'm afraid the code you are proposing is not working. The value returned from the MPI_GetVersion is not correct. Is this an ownership problem? The moment I put a line that touches the raw-array after receiving it the whole thing falls apart!

Then you are doing something else (possibly unrelated) wrong, or you are not showing us the actual code that is failing.

Which is exactly what my code above does.

At this point, I have no idea. It is not possible that your array (with the part after the \0 appropriately removed) has the correct contents and then CString::new(array) doesn't, unless CString::new() has a bug (very unlikely). CString::new() converts the input to a Vec and owns it, so you can throw the input away and the string's contents should still be preserved.

Then you are doing something else (possibly unrelated) wrong, or you are not showing us the actual code that is failing.

I wish I knew what that could be! The code shown is what is failing for all that I know. It's the first call to the DLL and the code is the complete function.

Which is exactly what my code above does.

Yes, it looks promising, although it messes up the call to MPI_GetVersion so that the string get scrambled. My feeling right now is that we have two problems, one was the issue with the null termination of the array when converting it to a string. But with that sorted we still have a problem with the pointer sent to the DLL. If I use the data after it returns from the DLL then the data isn't there!

This code works in the debugger, the correct value is shown on array in the variable window of VS-Code (I place a breakpoint on the MPI_GetVersion line, then steps over it and look at array_u8):

    pub fn get_version() -> Result<String,HiwinError> {
        let mut array_u8: [u8;10] = [0;10];
        unsafe {
            hiwinInterface::MPI_GetVersion(array_u8.as_mut_ptr().cast());
        }        
        Err(HiwinError::DriverVersion) // Todo: Correct return when function works
    }

This code does not work!

    pub fn get_version() -> Result<String,HiwinError> {
        let mut array_u8: [u8;10] = [0;10];
        unsafe {
            hiwinInterface::MPI_GetVersion(array_u8.as_mut_ptr().cast());
        }        
        let major = array_u8[0];
        let minor = array_u8[2];
        let patch = array_u8[4];
        let build = array_u8[6];
        let version = "{major}.{minor}.{patch}.{build}".to_string();

        Err(HiwinError::DriverVersion) // Todo: Correct return when function works
    }

If I just remove the let version-line, then the major, minor, patch and build gets their correct values. So it works as long as I don't use the values in the array for something usefull! Copying it around is fine, but if I try to build a string from it, then it goes down the drain!

But you aren't building the string correctly.

  • First off, "{stuff}".to_string() is simply turning a string literal into an owned string, it doesn't format anything. You probably meant format!("{major}.{minor}") etc.

  • Next, if the version number is given by the function as a string in the first place, then array[0], array[2], etc. will be ASCII character codes of the version digits. So if the function fills your buffer with the string "2.1.7.0", then interpreted as bytes, the array will contain 50, 49, 55, and 48 at the given positions, and you'll incorrectly get "50.49.55.48" as your version string.

    This back-and-forth conversion dance is completely unnecessary, and you should just interpret the byte array as an ASCII string instead of treating it as character codes and trying to re-build the string from its components over and over again.

You are correct in the things you point out, but parsing the result isn't the problem here, the problem is that when I try to parse it (in whatever way) it stops working and I don't even have a valid result to parse!

Sorry, at this point I think this is a dead end. The example code I provided works if the function really correctly fills in the array, as proven by the playground. Unless you can share more concrete details about the real code that you are running, I'm afraid I'm unable to offer further specific help.

Well, this is the real code. There is a main function jumping straight into an init function jumping straight in to this. The tough part is that I'm trying to use an opaque library provided by third party. Your code works as long as there is data to parse, so no complain about that. My problem seem to be data ownership related since we are leaving the safe environment of rust, going into the harmfull DLL and C++ world!

In this case, is there a realistic possibility of the C++ function having a bug? Or is there perhaps an ABI mismatch? You might need to check what calling convention the function is using, and adjust its declaration on the Rust side.

...is suspicious given the fact that Rust has support for extern "stdcall".

1 Like

Suspicious indeed! I tried to change the extern "C" to extern "stdcall" but I couldn't see any difference. But I know nothing about ABI, so I will have to read in on that.