Rust FFI with complex return values


#1

Hello,

I have been interested in Rust for long time and wanted to try it out finally. I have a Python project and was hoping to re-write some parts of it to speed up things. The idea is to use Rusts FFI interface to connect to Python via ctypes.

My problem, however, is that I need to return quite complex arrays/structures to Python. And this does not work properly. Unfortunately, I could not find any tutorial in the web for this kind of problem.

I posted a snipped to https://is.gd/3MuKDD. The snipped takes a Rust vector (peaks:Vec<Peak>) and copies its content to its C-structure array counter part (cpeaks:Vec<CPeak>). Within CPeak is another pointer to an array of CFrg. The cpeaks vector is returned as call-by-reference to have space for an error return value.

When I try to read the data with Python I have two problems:

  • the first entry of cpeaks is empty (or better: garbage)
  • the CFrg entries are all garbage.

I guess that the problem is the lifetime of cpeaks which probably does not life long enough to be accessed in Python. However, I have no idea how to make it keeping alive. I tried Boxing it, but this did not help.

Any help would be appreciated!

Thx,
Ronny


#2

It would help if we could see the Python side of the communication.

The problem isn’t the lifetime of cpeaks: since you called mem::forget on it, the buffer it contains (which you previously extracted with .as_ptr()) won’t be freed - ever, without specific code to do so; it’s leaked.

But you don’t call mem::forget on your frgs vecs, so their buffer will be freed each time around the outer for loop, leaving the raw *mut CFrg you stuck into cpeaks dangling.


#3

Dear Comex,
thank you very much for the fast reply.

If I add mem::forget(frgs), I get the fragments. :slight_smile: Further, it works now with my Python code (see below).

But is using mem::forget() the preferred way? If I have to remove cpeaks and cfrg by hand, how would I do this best? I might have more complex data structures to share between Rust and Python.

Here is my Python code:

#!/usr/bin/python

import sys, ctypes
from ctypes import *

# prefix = {'win32': ''}.get(sys.platform, 'lib')
# extension = {'darwin': '.dylib', 'win32': '.dll'}.get(sys.platform, '.so')

lib = ctypes.cdll.LoadLibrary("./target/debug/libstandard_finder.so")

# the c struct 
class Standards(Structure):
	pass

class CFrg(Structure):
	_fields_ = [
		("mz_exp", c_float),
		("mz_diff", c_float),
		("intensity", c_float),
		("resolution", c_float),
		("noise", c_float)
		]

class CPeak(Structure):
	_fields_ = [
		# ("chemsc", c_char_p),
		("mz_exp", c_float),
		("mz_diff", c_float),
		("intensity", c_float),
		("resolution", c_float),
		("noise", c_float),
		("nb_frg", c_int),
		("frgs", POINTER(CFrg)),
		]

class Array(Structure):
	_fields_ = [("len", c_size_t), ("data", c_int)]

class Simple(Structure):
	_fields_ = [("one", c_float), ("two", c_double)]

### initialize the c functions
lib.standard_finder_new.restype = POINTER(Standards)
lib.standard_finder_read_csv.argtypes = (POINTER(Standards), c_char_p)
lib.standard_finder_get_standards.argtypes = (POINTER(Standards), c_char_p, POINTER(POINTER(CPeak)), POINTER(c_size_t))

### execute the c functions

# gen standards obj
obj = lib.standard_finder_new()

# read standards.csv and store in 'obj'
succ = lib.standard_finder_read_csv(obj, "Standards.csv")

# read peaks from standards obj
cpeaks = POINTER(CPeak)()
cpeaks_len = c_size_t()
lib.standard_finder_get_standards(obj, str("test.xml"), byref(cpeaks), byref(cpeaks_len))
for i in range(cpeaks_len.value):
	print ">>>", cpeaks[i].mz_exp, cpeaks[i].intensity
	if cpeaks[i].nb_frg > 0:
		print ">>> >>>", cpeaks[i].nb_frg
		for j in range(cpeaks[i].nb_frg):
			frg = cpeaks[i].frgs[j]
			print ">>> >>>", frg.mz_exp, frg.intensity

# read peaks from standards obj
cpeaks = POINTER(CPeak)()
cpeaks_len = c_size_t()
lib.standard_finder_get_standards(obj, str("test.xml"), byref(cpeaks), byref(cpeaks_len))
for i in range(cpeaks_len.value):
	print ">>>", cpeaks[i].mz_exp, cpeaks[i].intensity
	if cpeaks[i].nb_frg > 0:
		print ">>> >>>", cpeaks[i].nb_frg
		for j in range(cpeaks[i].nb_frg):
			frg = cpeaks[i].frgs[j]
			print ">>> >>>", frg.mz_exp, frg.intensity

#4

I see.

The way to “undo” mem::forget with a Vec is to call Vec::from_raw_parts, and letting the resulting Vec go out of scope so the destructor runs, but it’s tricky (you need to get the capacity right) and possibly not the most elegant approach.

One alternative approach is to maintain two data structures in parallel: the version that Python reads, which contains raw pointers, and a version that contains the original Rust objects of the buffers that need to be dropped, something like:

struct Buffers {
    peaks_buffer: Vec<CPeak>,
    frgs_buffers: Vec<Vec<VFrg>>,
}

Then you could store all your buffers in a Box<Buffers>, and call into_raw on the Box to (a) get a raw pointer which could be passed to Python as an ‘opaque pointer’ and (b) prevent it from being deallocated immediately. When your Python code was done with the data, it would call another Rust function with that pointer, which would call Box::from_raw to reconstitute it, then let the destructor deal with freeing all the subobjects. Really the same fundamental approach as reconstituting the Vecs directly, but all-in-one, so less work with more complex data structures.

(In some cases you could expose the same struct to Python that contains owned Rust objects, but be careful: the size and layout of, say, Vec is not guaranteed to be the same across Rust versions.)

Other approaches include:

  • designing the API so Python allocates all buffers in the first place;

  • using one of the arena crates for all allocation, details depending on the crate;

  • using C memory management for all allocation, so Rust can allocate and Python can free directly (probably bad design)


#5

Dear Comex,

thank you very much for your detailed answer. I will use your suggestions and see how far I will come.

Cheers,
Ronny


#6

I found a post in another thread were the suggestion is the same as yours: using a Box::into_raw to pass the struct to Python (link to reddit).

I was trying this myself, but I could not make it work. I wonder if I miss something. Here is an example:

#[repr(C)]
pub struct Simple {
    one: c_float,
    two: c_double,
}

#[no_mangle]
pub extern "C" fn test(mut simple: *const Simple) {
    simple = Box::into_raw(Box::new(Simple {
        one: 99.0 as c_float,
        two: 33.0 as c_double,
    })) as *const Simple;
}

and Python:

class Simple(Structure):
	_fields_ = [
               ("one", c_float), 
               ("two", c_double)]

### initialize the c functions
lib = ctypes.cdll.LoadLibrary("./target/debug/libstandard_finder.so")
lib.test.argtype = (POINTER(Simple))
s = Simple()
lib.test(byref(s))
print ">>>", s.one, s.two

Running the code results in:

>>> 0.0 0.0

So, it seems that the values were not copied into the struct.


#7

From what I’ve seen in tutorials, you often have to pass the pointer back to another Rust function which will get the requested value. It’s kludgy, but it gets the job done.


#8

Do you mean something like this?

#[no_mangle]
pub extern "C" fn mk_simple() -> Box<Simple> {
    let s = Box::new(Simple {
        one: 99.0 as c_float,
        two: 33.0 as c_double,
    });
    s
}

#[no_mangle]
pub extern "C" fn test2(mut simple: *mut Simple) {
    simple = Box::into_raw(mk_simple());
}

If I run this with my python script, I get the same result:

>>> 0.0 0.0

:neutral_face:


#9

You have your types mixed up. ‘mut simple’ means the pointer-typed variable simple is itself mutable; your code then ignores the original pointer passed as a parameter and changes the local variable to point to the result of Box::into_raw, then returns without doing anything useful. It’s as if you took a parameter like ‘mut x: u32’ and set x to 6: it won’t affect the caller.

If you want to pass a pointer as an ‘out parameter’ to Python then you need to declare it as a pointer to pointer, *mut *mut Simple; or you could just have the pointer be an actual return value. Alternately you can have Python allocate the Simple buffer itself (which is what your current Python code does), in which case there’s no need for Box on the Rust side; just fill in the fields of the pointer-to-struct - though then the Rust destructor of Simple won’t be invoked, which matters if there are any owning types in it.


#10

@Ronaldho80 Here’s the tutorial I was thinking about. I’m guessing that this is the way that will work in 100% of cases, but that it may be possible to transfer data in structs more directly in most languages (like Python).


#11

Thanks comex and bitgrinder. The tutorial is quite comprehensive.

The tutorial and comex suggested to use Vec<...> within a struct instead of a raw pointer. E.g.:

pub struct Simple {
   one: Vec<c_float>,
   two: Vec<Vec<c_double>>,
}

instead of

pub struct Simple {
   one: *mut c_float,
   two: *mut *mut c_double,
}

However, if I want to use nested vectors in the struct, I get a seg fault:

pub struct Buffer {
   buffer: Vec<Simple>,
}
pub struct Simple {
   one: Vec<c_float>,
   two: Vec<c_double>,
}

I guess that the Rust vectors size is not known to the C-Struct and that’s why it is messed up.

So, I tried using raw pointer:

#[repr(C)]
pub struct Simpler {
    eins: c_int,
}

#[repr(C)]
pub struct Simple {
    one: c_float,
    two: c_double,
    three: *mut Simpler,
}

#[repr(C)]
pub struct Buffer {
    simples: *mut Simple,
}

pub extern "C" fn test2(mut buffer: *mut Buffer) {
    let mut simple = vec![];
    unsafe {
        for i in 0..10 {
            let mut three = vec![];
            for j in 0..10 {
                three.push(Simpler { eins: j as c_int });
            }
            let mut s = Simple {
                one: 99.0 as c_float,
                two: 33.0 as c_double,
                // boxing does not work, because, it is deallocated
                // three: Box::into_raw(Box::new(three)) as *mut Simpler,
                // works:
                three: three.as_ptr() as *mut Simpler,
            };
            // forget three to stop de-allocation. However, Python needs to
            // de-allocate the struct. I am not sure, if it is able to do this.
            mem::forget(three);
            simple.push(s);
        }
        (*buffer).simples = simple.as_ptr() as *mut Simple;
    }
}

I found two problems here: the first one is that I need to use mem::forget to not let Rust de-allocate the three struct. Here, I am not sure, if Python really is able to drop three later. How could one drop a “forgotten” struct later in Rust?

The second problem, I found is that the first entry of the Buffer vector is messed up. All the others are fine. What could have caused this? Would you know a solution?

Thanks again for your help!


#12

I am interested in this as well. I just wanted to offer a suggestion: Python’s cffi module might be easier/more performant to use. There is a very good example of using Rust from Python here: https://github.com/jbaiter/python-rust-fst .


#13

Cool! Thx, this is a nice example!