Need some help rewriting a C++ library in rust

Hey there,

This is a repost from this reddit post.

After playing around in safe rust for a few projects I decided to attempt to do something more serious, porting a C++ library from my work to rust. Somethings have worked out of the box, some not so much. Here is the problem:

I have the following data struct in C++:

struct SomeStruct {
  string data
  double value
}

and the following signitures:

void* start(string, size_t, SomeStruct[])
void step(size_t, size_t, SomeStruct[], SomeStruct[], void*)
dealocate(void*)

In rust SomeStruct became:

#[repr(C)]
struct SomeStruct {
  data: CString,
  value: f64
}

So, my questions function by function:

start

I rewrote it as:

#[no_mangle]
pub extern fn start(xml: CString, size: usize, array: *const SomeStruct) -> *mut c_void

I have the following problems:

If I use this code inside it:

let someString = xml.into_string().unwrap();
println!("string: {:?}", someString);
...

and call it in C++:

extern "C" {
  void* start(string, size_t, SomeString*);
}
start("text", 0, nullptr);

I get the output

string: ""

i.e., it reads as an empty string. And what's worst, it seems like it's overflowing the argument stack or something like that, because if I try to print the size argument I end up getting some gibberish.

But what I really found interesting is that if I remove this argument from the function signature (of course both on rust and C++) and actually pass something to size and array and print the CString inside SomeStruct it will work fine. Why is that?

I also tried calling it from C++ using the c_str() for the string (changing the function parameter signature to const char*), but got the same results.

Finally, the return type. The output from start should be passed to step and dealocate. It's a pointer to an allocated memory space containing the state of the library. I was able to replicate this behavior by creating some state inside the library (by creating some random struct) and using transmuting it at the end of the function:

start(...) -> *mut c_void {
  ...
  let someState = State {...};
  unsafe {transmute(someState)}
} 

step

Step became:

#[no_mangle]
pub extern step(size1: usize, size2: usize, array1: *const SomeStruct, array2: *const SomeStruct, ptr: *mut c_void) -> ()

I seem to be able to read the state from the start function just fine by transmuting it back. But I read everywhere that transmute is overkill in most cases. What is the non-overkill way of doing this? and how it actually works?

dealocate

#[no_mangle]
pub extern dealocate(ptr: *mut c_void) -> ()

And finally, how should I implement the dealocate function? As in, how to make sure in rust that the whater the void pointer points to is dropped?

Thanks!

1 Like

Let's figure out that string first! I assume that on the C++ side string is std::string. And this is actually a problem! It is a C++ class which has some specific layout, and Rust does not know about it at all. It could not know, because there is no C++ ABI. The CString in Rust is for managing C-style strings created within Rust. Most likely even the sizes of CString and std::string differ!

So you'll have to write some code on the C++ side to expose a C ABI. That is, to pass a pair of char and length. Likewise, you can't just define SomeStruct in Rust because it has a std::string field on C++ side.

I think it should theoretically be possible to write a layout compatible wrapper around std::string in Rust for a particular version of STL and a C++ compiler, but I don't know any attempts at that.

Passing a C-style string as the function parameter would not be a huge problem, and like I said, I actually tried using c_str(). But what I find interesting is that the CString inside SomeStruct actually seems to work fine! I'm sure there's a good reason for that and I'm interested in finding it out.

That's the problem with undefined behavior: sometimes it seems to work :slight_smile: CString and std::string have no connection whatsoever.

Yeap. I've been getting some weird results with the string that I thought was working.

So, if I actually want to pass a string (be that as a char*, const char*, or whatever) to rust, what is the proper way of doing this?

The simplest and safest approach would be to pass a non owning * const char and convert it to a String in Rust:

use std::ffi::{CStr, CString};
use std::os::raw::c_char;

extern "C" fn foo(s: *const c_char) {
    // Safely copy the data
    // Will panic if `s` is not utf-8
    let owned_string: String = CStr::from_ptr(s).to_str().unwrap().into();    
}

If you can make that work, you can then think about some ways of avoiding coping strings around. But given that start accepts its argument by value in C++ it probably does not matter.

1 Like

Interesting. Yes, this works as it should and because we're trying to make our code more portable this change in the signature shouldn't have a big impact (since functions with that signature are only once).

On the other hand, the change in SomeStruct might be a little more problematic. Might have to look into the wrapper idea. Any tips on where to start?

Thanks for the help!

edit: also, what's the difference between std::os::raw::c_char and libc::c_char?

std::os::raw::c_char is from the standard library, libc::c_char is from an external crate. It is asserted that they refer to the same type.

Any tips on where to start?

Ah, I don't really know, it's better to ask @emoon :slight_smile:

I would re-read the chapters on raw pointers, ffi, rustonomicon, and would try something like this:

extern "C" {
  fn SomeStruct_data(s: * const SomeStruct) -> * const c_char;
  fn SomeStruct_value(s: * const SomeStruct) -> double;
}

enum SomeStruct {} // opaque struct, as per https://doc.rust-lang.org/book/ffi.html#representing-opaque-structs

// Get the size from your C++ compiler (a custom build step perhaps?).
// I guess we must know it somehow to work with arrays. 
const SIZEOF_SOME_STRUCT: usize = 92; 

impl SomeStruct {
  fn data(&self): CStr {
    unsafe { CStr::from_ptr(SomeStruct_data(self as * const SomeStruct)) }
  }

  fn value(&self): double {
    unsafe { SomeStruct_vaule(self as * const SomeStruct)) }
  }

}

extern "C" start(xml: * const c_char, n_structs: usize, fisrt_struct: * const SomeStruct) { }

I wounder if we can define SomeStruct as

SomeStruct([u8; sizeof(C++ SomeStruct)])

?

Will this be valid?

2 Likes

From my experience I only try to share POD structs between my Rust and C(++) code otherwise there will be pain.

In some cases (where it can be avoided) I have temporary structs for marshaling the data between but usually that isn't needed.

It also depends a bit on the "direction of the code" If I have some Rust code that should use some C(++) code I pretty much always wrap the C(++) code in some Rust wrapper so I can provide a pure Rust API to the user so the user doesn't actually need to care if there is some other code being called with in that API.

I'm not really sure if that answers your question though :slight_smile: