C/Fortran FFI memory error

Hi all. I'm trying to create a FFI to an old Fortran script. It's the one used by scipy.optimize.nnls. I think the official webpage is here. Seeing that there's a C version, I tested it before.

// libc = "0.2"
use libc::{c_double, c_int};

use ndarray::{Array1, Array2, ShapeBuilder};

extern "C" {
    fn nnls_(
        a: *const c_double,
        mda: *const c_int,
        m: *const c_int,
        n: *const c_int,
        b: *const c_double,
        x: *mut c_double,
        r_norm: *mut c_double,
        w: *mut c_double,
        zz: *mut c_double,
        index_bn: *mut c_int,
        mode: *mut c_int,
        maxiter: *const c_int,
    );
}

pub fn nnls(a: &Array2<f64>, b: &Array1<f64>, max_iter: Option<isize>) -> (Array1<f64>, f64) {
    let (m, n) = a.dim();

    // This is calling a Fortran library so we must use 'f' order. Moreover, nnls_ modifies it
    let mut fortran_a = Array2::zeros(a.dim().f());
    fortran_a.assign(&a);

    // `b` is copied here because the inner Fortran code modifies it.
    let mut b = b.to_owned();

    let mut x = Array1::<f64>::zeros(n);
    let mut r_norm = 0.0;
    let mut w = vec![0.0; n];
    let mut zz = vec![0.0; m];
    let mut index = vec![0i32; n];
    let mut mode = 0i32;

    let m = m as i32;
    let n = n as i32;
    let max_iter = max_iter.unwrap_or(-1) as i32;

    unsafe {
        nnls_(
            fortran_a.as_mut_ptr(),
            &m,
            &m,
            &n,
            b.as_mut_ptr(),
            x.as_mut_ptr(),
            &mut r_norm,
            w.as_mut_ptr(),
            zz.as_mut_ptr(),
            index.as_mut_ptr(),
            &mut mode,
            &max_iter,
        );
    }
    match mode {
        2 => panic!("THE DIMENSIONS OF THE PROBLEM ARE BAD"),
        3 => panic!("ITERATION COUNT EXCEEDED. MORE THAN MAXITER ITERATIONS."),
        _ => { /* 1 and all other codes are ok */ }
    }
    (x, r_norm)
}

I'm using pointers even for simple i32 because that's what the C declaration is expecting. It does work. It gives the same results as scipy, it's as fast, etc. However, it crashes when I test it with several threads: Segmentation fault (core dumped) I tried swapping some 'm' for 'n', it creates different memory errors (I'm pretty sure I got them right anyway) Changing *const to *mut changes nothing.

fn main() {
    // ndarray = "0.15"
    // ndarray-rand = "0.14"
    use ndarray::{Array1, Array2};
    use ndarray_rand::rand_distr::Uniform;
    use ndarray_rand::RandomExt;

    let mut a = Array2::random((107, 145), Uniform::new(0.0, 1.0));
    a.row_mut(0).fill(1.0);
    let b = Array1::random(107, Uniform::new(0.0, 1.0));
    rayon::scope(|s| {
        s.spawn(|_| {
            for _ in 0..5 {
                let _ = nnls(&a, &b, None).0;
            }
        });
        s.spawn(|_| {
            for _ in 0..5 {
                let _ = nnls(&a, &b, None).0;
            }
        });
    });
}

At that point, I'm not sure what to test and I gladly accept your help.

Is the FORTRAN code thread safe?

The symptoms suggest that the underlying Fortran code is not thread safe. Verify whether it uses any global state. If it does, you will have to protect it with a mutex or similar.

I'm not sure about the Fortran version (I can only read simple lines). How would I know if it uses a global state?

As I wrote, I'm currently using the C version. I see several static variables. That makes it non thread safe, right?

Yes, statics are effectively globals and are definitely not thread-safe.

I tried using the Fortran90 version instead, which I suppose is thread-safe. I fixed the nnls signature

extern "C" {
    fn nnls_(
        a: *mut c_double,
        m: *const c_int,
        n: *const c_int,
        b: *mut c_double,
        x: *mut c_double,
        r_norm: *mut c_double,
        w: *mut c_double,
        index_bn: *mut c_int,
        mode: *mut c_int,
    );
}

pub fn nnls(a: &Array2<f64>, b: &Array1<f64>) -> (Array1<f64>, f64) {
    let (m, n) = a.dim();

    // This is calling a Fortran library so we must use 'f' order. Moreover, nnls_ modifies it
    let mut fortran_a = Array2::zeros(a.dim().f());
    fortran_a.assign(&a);

    // `b` is copied here because the inner Fortran code modifies it.
    let mut b = b.to_owned();

    let mut x = vec![0.0; n];
    let mut r_norm = 0.0;
    let mut w = vec![0.0; n];
    let mut index = vec![0i32; n];
    let mut mode = 0i32;

    let m = m as i32;
    let n = n as i32;

    unsafe {
        nnls_(
            fortran_a.as_mut_ptr(),
            &m,
            &n,
            b.as_mut_ptr(),
            x.as_mut_ptr(),
            &mut r_norm,
            w.as_mut_ptr(),
            index.as_mut_ptr(),
            &mut mode,
        );
    }

    match mode {
        2 => panic!("THE DIMENSIONS OF THE PROBLEM ARE BAD"),
        3 => panic!("ITERATION COUNT EXCEEDED. MORE THAN MAXITER ITERATIONS."),
        _ => { /* 1 and all other codes are ok */ }
    }
    (Array1::from(x), r_norm)
}

and built it with

cc::Build::new().file("src/ffi/nnls.f90").compile("nnls");
println!("cargo:rustc-link-lib=dylib=gfortran");

Now I'm stuck with

(signal: 11, SIGSEGV: invalid memory reference)

without any explanation or nothing. I'm unsure how I can even start debugging this. Any idea how to fix this?

If you have a debugger installed, try running a non-release version of your program under the debugger. The debugger should automatically pause when a segfault is triggered, which gives you a chance to print the backtrace and see how you got into that situation.

That's essentially what I did earlier this evening when troubleshooting a different segfault. From there I looked up the stack for code that belonged to the project and traced different arguments back though the program.

That's what I did. We can see this block of code almost of the start of the Fortran code

!                    INITIALIZE THE ARRAYS indx() AND X().

DO i = 1,n
  x(i) = zero          !!!CRASH!!!
  indx(i) = i
END DO

The debugger tells me

Stop reason: signal SIGSEGV: invalid address (fault address: 0x0)

and it is right... I check the address before the crash with this line

write (*,*) loc(x(1))

and it actually print 0. I don't see how this is possible. I also check the address in Rust before calling nnls_, it prints something like 0x7fc76c000cf0 How can this address in Rust becomes 0 in Fortran?

I've had to troubleshoot something similar in the past when I messed up the signature I wrote in my extern "C" block. In that case, I was passing a struct by-value from Delphi to Rust and the Delphi version had one field too few/many so the "next" argument contained the value of the last struct field. That meant instead of containing a "normal" integer I was seeing absolute garbage.

The way I eventually figured it out was by stepping through with a debugger and carefully watching the value of each register/variable as I made the FFI call (nnls_() in your case) to see how things change.

3 Likes

Assumed shape arrays in Fortran (e.g. real :: x(:)) are not interoperable with flat C arrays. The arrays would need to be assumed size to be interoperable (real :: x(*)). And the function should also probably be marked as bind(C) so that it has portable linkage.

Fortran 2018 adds ISO_Fortran_binding.h (for example, GCC's implementation) which makes assumed shape arrays interopable with C, but it requires filling out a descriptor struct. You'd need to find a way to import ISO_Fortran_binding.h for your target Fortran compiler into Rust using bindgen.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.