Consistently getting "fatal runtime error: allocator memory exhausted"

I'm working on a data crunching application and I'm getting a "fatal runtime error: allocator memory exhausted" almost always on the same point. What is strange is that I don't think the application is allocating that much memory. Monitoring it through "top" it seems like it is around 800mb of allocated memory. The computer I'm working with has around 8gbs of free memory.

The code where it seems to break down is just a for_each in a vector, where I'm just allocating some string of reasonable sizes (at most maybe 256 chars?). I literally have no idea how else do debug this. Can any one help me?

Oh, I'm using the most recent nightly.

1 Like

Do you have any unsafe code?

Try stracing (if you’re on Linux) and see what allocation requests are made.

1 Like

Thare's some unsafe code through ffi, but the point where it stops working has no ffi calls. I'll take a look at the strace output, never used it before.

That may not matter - you can have unsafe code corrupt something that only surfaces in some other place. Not saying this is the cause, of course, but don't dismiss it because it's not around the failing callsite.

1 Like

Fair enough. I just wouldn't expect to cause an out of memory error.

When I execute the code through strace it seems to be dying in another point of the code with a sigsegv. Maybe this ends up helping in the end.

That's the beauty of UB/corruption - it can manifest itself any which way. It sounds like something's busted in your case. How extensive/involved is the unsafe code? Is it something you can paste/link here?

2 Likes

The interface is simple enough, but the code is extensive. I'm back at the good ol' print/flush debug cycle for now.

I'll update once I find something interesting for future reference

So, I reduced the number of data in the system just to make things simpler. Right now it's dying with the following (using strace):

[pid 22697] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
[pid 22697] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---

Running through gdb I get

[Switching to Thread 0x7fa5bfbff700 (LWP 23474)]
__memcmp_avx2_movbe () at ../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:264

which seems like it might be an error with a memcpy that happens in the code? If that makes sense I'll paste some further code.

IMO, attempting to pin down wherever the segfault actually occurs is seldom helpful, because the UB that led to it has usually happened far earlier. (that said, there could be some value in checking the backtrace)

What I think would be more useful is to know the signatures of the ffi functions you are calling (if there aren't too many). This would let us help point out potential footguns you possibly should be looking for.

1 Like

If you're inside gdb, can you paste the full backtrace? Also, you may be able to inspect the frames in the backtrace to see what values are passed down. The fact that it's crashing in low level memcmp is yet another sign that there's UB/corruption going on.

1 Like

Ok, I have the following C interfaces:

struct TagStruct {
    char strTag[80];            
    double val1 = 0.0;	   
    double val2 = 0.0;		
    double val3 = 0.0;	
};

void* config_data(Type t, const char* xml,
                      int nCi, const TagStruct* ci,
                      int nIn, const TagStruct* in
                      int nOut, const TagStruct* out)

which is called once for every Model. This code is called exactly once per Model and returns a void* handler for it's internal data structures. It also memcpy the TagStruct arrays internally.

It's correspondent code in rust is:


#[derive(Debug)]
#[repr(C)]
pub struct TagStruct {
    pub str_tag: *mut c_char,
    pub val1: c_double,
    pub val2: c_double,
    pub val3: c_double,
}
extern "C" {
pub fn adicionar_modelo(
        t: Type,
        xml: *const c_char,
        nCi: c_int,
        ci: *const TagStruct,
        nIn: c_int,
        in: *const TagStruct,
        nOut: c_int,
        out: *const TagStruct
    ) -> *mut c_void;
}

The following functions are called continually, where set updates the internal data structures, step runs a step of the number crunching and get copies the internal data structures to it's inputs:

void step(void* ptr);

void set (void* ptr, const TagStruct* in);

void get(void* ptr, TagStruct* in, TagStruct* out);

And the corresponding rust code:

extern "C" {
 pub fn set(
        ptr: *mut c_void,
        in: *const TagStruct
    );

 pub fn get(
        ptr: *mut c_void,
        in: *mut TagStruct,
        out: *mut TagStruct
    );

pub fn step(ptr: *mut c_void);
}

I should also say that I'm used to working with this code through java's JNA/JNI and directly through C++ and these errors never happened.

After the crash the only thing trace prints is

Tracepoint 1 at 0x7fa5c34620e0: file ../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S, line 264.

These are not equivalent. str_tag should be [c_char; 80].

It's common to think of C arrays as like pointers since they behave like pointers in expression context; but there is a subtle difference thanks to being stored on the stack.

3 Likes

@ExpHP beat me to it and is right on the money - the FFI repr is not the same and this can easily explain all sorts of memory corruption :slight_smile:.

You may want to try out bindgen to generate FFI bindings for you (or at least play with it to see how it would translate some C interface to Rust).

2 Likes

Hah, yeah. Now that I see it written down it's really obvious. I'm going to try it out!

Last question:

I'm struggling to make a String into a [c_char; 80], can someone show me the way?

It is tricky; I think you have to do it yourself. I'm on my phone so sorry if this is terse, but try a utility like this.

// Rust equivalent of strncpy.
// bytes does not need a terminating nul, and the slices do not need to have matching length;
// it copies and nul-terminates the part that fits.
// Only valid for ASCII, because otherwise it may
// get cut off in the middle of a character.
fn copy_ascii_to_array(out: &mut [c_char], bytes: &[u8]) {
    use std::iter::once;
    let bytes = bytes.iter().chain(once(&0));
    for (&b, dest) in bytes.zip(out) {
        // May want to replace this with an Err
        assert!(b.is_ascii());
        *dest = b as c_char;
    }

    // In case bytes was too long to fit
    let last = out.last_mut().expect("zero-len c string is impossible!");
    *last = 0;
}
// After initializing arr as [0; 80], and where
// s is an &str
copy_ascii_from_bytes(&mut arr, s.as_bytes());
2 Likes

Everything is working now. Thank you everyone! (there's a small error on your code, but it's easy to fix)

1 Like