FFI: Compiling release does not return from foreign (c++) function

Hi,

Well i do not know where t start looking for the issue and therefore I need some pointers. Situation is the following. I wrote this program that depends on some c++ function. This function receives a reference to a vector that it fills out. so when i cargo build or cargo run i see the content of that vector but if i do cargo build --release or i cargo run --release i get an empty vector. Why and how to fix this ?

My toml:

[package]
name = "tool"
version = "0.1.0"
edition = "2021"


[dependencies]
libc   = "*"
fxhash = "*"
serde  = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
rustop = "*"
regex  = "*"
itertools  = "*"
md5    = "*"
flate2 = "*"


[build-dependencies]
cc = { version = "*", features = ["parallel"] }

My build.sh:


const FILES: &[&str] = &[
    "src/k/c/candidates.cpp",
    "src/k/c/rescue.cpp",
    "src/k/c/tools.cpp",
    "src/k/c/abi.cpp",
];


const HEADERS: &[&str] = &[
    "src/k/c/structure.h",
    "src/k/c/kstring.h",
    "src/k/c/keq.h",
];

fn main() {
    for file in FILES {
        println!("cargo:rerun-if-changed={}", file);
    }
    for file in HEADERS {
        println!("cargo:rerun-if-changed={}", file);
    }
    println!("cargo:rustc-link-lib=z");
    println!("cargo:rustc-link-lib=m");
    println!("cargo:rustc-link-lib=stdc++");
    cc::Build::new()
        .define("COMPILATION_TIME_PLACE", "\"build.rs\"")
        .warnings(false)
        .extra_warnings(false)
        .files(FILES)
        .flag("-fPIC")
        .compile("k.a");

}

function in my tool.rs:

use crate::{Error, ErrorKind, Result};
use libc::{c_int, c_char};

extern {
    pub fn k (batch:*const c_char, result:*const c_char, sep: c_char, len: c_int, con: c_int) -> c_int;
}

pub(crate) fn k (&self, batch: &[u8]) ->Result<Vec<u8>>{
        let cval  = 12;
        let mut blen = batch.len() * cval;
        let mut result = vec![0i8; blen];


        let len = unsafe {
            k( batch.as_ptr() as *const i8, result.as_ptr()  as *const i8, '|' as i8, batch.len() as i32, cval as i32) as usize
        };

        result.resize(len as usize,0);
        println!("{:?}",result);
        
       // stuff ...

       Ok(result)
}

What am i not understanding / missing here ?

Thank you !

PS:
Sorry for that semi example but i do not where to start ...

You have at least one instance of UB: result.as_ptr() doesn't give you write access to the buffer, yet you are trying to write to it. It may be that the compiler optimizes away the write based on this assumption.

Code that works only without optimizations is a telltale sign of UB. You may also have UB inside the FFI function itself.

6 Likes

I am confused why you have two functions named k. I guess the Rust one is supposed to be mangled by the compiler into a function with different name, containing module path etc, but I would still suggest you choose less confusing naming.

If I understand your code correctly, you create a result: Vec<i8> on the Rust side, and pass its buffer to C++ by calling the function k. Thing is, result.as_ptr() takes the receiver as &self. The & reference in Rust means that the object is immutable (there is an exception for memory contained in UnsafeCell, but your example has none of it). This means that the compiler thinks that k can't modify result, and so it's free to reuse the previous empty vector.

Note that the C++ compilers are also very strict about proper const correctness, and you can get similar issues on the C++ side. Yes, you can const_cast a const T* to a T*, but the spec requires that the resulting pointer isn't used to mutate the referent, otherwise you get UB. The const_cast exists basically to support old or poorly written C/C++ code which doesn't properly declare function arguments const, even though it doesn't mutate them.

4 Likes

I generally do this like:

pub fn foo(...) {
  extern "C" {
    #[link(name = "foo")] // iirc
    fn native(...);
  }
  // presumably something to make it safe here
  native(...)
}
1 Like

Unfortunatelly for compilers the spec is not so strict.
It is only UB if the underlying object is const - in practical terms, when the underlying object can be in read-only memory.

3 Likes

Thank you all for your valuable posts. After reading them i realized how little attention have i been paying to UBs. My code was full of those. strange how in the debug mode all memory locations were correctly filled with values and all my tests went through. I managed to debug a lot but i am still struggling with how to best address setting global variables.

So in my src/k/c/structure.h i have :

extern int MaxSize;

then in my src/k/c/abi.cpp i have it declared:

#include "structure.h"

int MaxSize;

then in my rust i have a function that sets it:

use libc::{c_int, c_char};

#[no_mangle]
extern {
    pub static mut  MaxSize : c_int;
}

impl MyObj {
   pub(crate) fn set_vars (&self){
        unsafe { MaxSize = self.opts.size as i32  ;}
  }
}

After i call self.set_vars() the variable set and available. However after reading all your posts and further googling I am now under the impression that i am doing it wrong. To be more precise pub static mut MaxSize : c_int; is not ok/safe to do, right?. Is there a better solution to this ?

Thnx

PS
Please let me know if i should open another post for this issue

There is nothing inherently wrong with a static mut, technically. That definition does match the declaration in the header file.

Whether it's a good idea to use static mut at all is another question. Personally, I never found a good/valid use case for it and I think you should always put mutable state into a local and pass it as arguments.

2 Likes

That's quite typical for UB, it usually disappears when you disable optimizations. Not always, a data race is still a data race, an out of bounds write will still corrupt your memory in some ways, but most nasty cases of UB manifest as incorrectly applied compiler optimizations (because the compiler optimizes your code using a number of assumptions, which can be shortened to "your code has no UB").

That's the worst part about UB: it can manifest in any way possible, including "my code works as intended", and can change its form depending on arbitrary external changes, including the phase of the moon or changes to your program's whitespace (yes, the last one is very real, because it typically affects the compiler's randomness-based heuristics).

If you write unsafe code, you really should consider running your tests under sanitizers, including Miri (works only for pure-Rust code with some restrictions), Valgrind (whole program, but quite slow and catches only certain classes of UB, most importantly buffer overflows and use-after-free) and built-in sanitizers. It still won't guarantee your code is correct, but it's likely to catch many bugs.

I would strongly recommend against it. Mutable statics are very difficult to use correctly in Rust, due to complexities around Rust references. If you do go down this way, use mutable statics exclusively through raw pointers. Don't create references at any point. If you need to read or write it, use ptr::read and ptr::write.

In general, local variables should be strongly preferred. On the Rust side, you can also use ordinary statics together with a type with interior mutability, e.g.

static FOO: Mutex<u32> = Mutex::new(3);
static BAR: RefCell<bool> = RefCell::new(true);

Also consider OnceCell and LazyCell (stabilized in a soon-coming Rust release), as well as their sync variants.

Of course, these types cannot be directly exposed to C++ code.

The last option is to do all operations with a static on the side on one language (e.g. a mutable static available only to C++ code, or a static Mutex available only to Rust), and expose it to the other language exclusively as FFI functions. For most use cases the overhead should be acceptable, and full-program LTO should be able to eliminate it.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.