Ndarray-parallel using Rayon killed no traceback. Help

i have a 3d ndarray that is 6,500,000 x 2 x 108 and the process seemingly dies when trying to select 5 different 2d arrays along Axis 2. This is being run in parallel with 8 cores, and it does not happen on datasets that are much smaller, but this shouldn't be that hard in the scheme of things meaning 6.5m is alot but not THAT crazy. The error when it dies is just 'Killed'. Any ideas? Thanks!

What does your array store? How mich memory do you have available on the system?
Assuming you store something that requires 8 bytes your array already is around 10GB of data.

I'm not too familiar with ndarray itself, but viewing (or copying) 5 2D subarrays with dimensions 2 x 108 should not require a lot of memory (about 216 times the memory footprint of a single element).

Can you share the problematic code or a minimal version that behaves as you said?

Also, if running in parallel, do you create the array multiple times? Depending on the parallelizatization you might end up allocating it many more times than you think.

2 Likes

That’s what you’ll see if Linux’s out-of-memory killer terminated the process. If it did, I think there should be an entry in /var/log/messages about it.

2 Likes

Try putting an internal limit on amount of memory Rust can use. This way if you hit the limit, you'll get abort from Rust itself rather than KILL from the OS.

What's the ndarray operation?

If I understand you correctly, each "job" will try to copy (? because you said select) out data of approximately 6,500,000 x 2 x 5 elements. First of all, it would be best to not use any copying operations at all, and work on the data in place. Rayon parallelism can sometimes start a lot more jobs than there are available threads - which means that there could be upwards 100s of jobs running in parallel, and I could see that pressing memory usage. Try to limit the fanout of the rayon operation to just one per cpu.

Thanks for the suggestion, i will try this. Is there any way to see memory usage as it is happening using rayon? Thanks!

The cap allocator has method calls to query current memory usage, so you could combine that with your own logging.

For monitoring overall memory usage I use Instruments from Xcode (I develop on macOS), and it has a memory usage profiler that's compatible with Rust.

1 Like

My sense is that the memory issues are because i am sharing some memory across threads. I have been confused about this for some time. Rust advertises that it is thread-safe, and rayon gets sold as plug-and-play, but don't i still need to wrap an array shared between threads in a mutex/arc so that the threads that are accessing/selecting single arrays from a common array don't kill each other? And if doing so, can that be done with Rayon or should it be done with Rust's basic thread::spawn

Linux sends sigkill when program, as a whole, uses too much memory. It's not the same thing as segfault caused by memory unsafety.

Rust's safety guarantees are specific to memory safety, but not a guarantee against all bugs. For example:

let mut vec = Vec::new();
loop {
   vec.push(1);
}

This is 100% safe by Rust's safety definition. It will eat up all your memory and either get the program killed, or make your computer unusable.

So you should watch out how many tasks you queue in rayon. Number of tasks taken by rayon is not limited to the number of cores. If you tell rayon to run 6 million tasks, it will queue up 6 million tasks at the same time. You're probably adding all of them at once, and each task uses some memory, so sum of all tasks exceeds available RAM.

1 Like

That makes sense. Much appreciated. I have one followup question because i am fairly new to rayon.

If you are running a program and block 'a' produces a single 2d array called BIG and block 'b' uses rayon to access random 1d array's from BIG, is that thread-safe if the compiler doesn't object, or would i need a mutex/arc on BIG ?

Thanks Man!!

If you're not using unsafe{} and it compiles, it's thread-safe

Thread-safe in the sense you won't get data races and use-after-free errors. The compiler will complain things aren't Sync when a Mutex is necessary, or that they don't live long enough when Arc is necessary.

Do you have any recommendations for how to throttle rayon from over-tasking? i have 15 functions that are run in the thread but they are all essential. Maybe a different library?

don't i still need to wrap an array shared between threads in a mutex/arc so that the threads that are accessing/selecting single arrays from a common array don't kill each other?

Reading the same data on multiple threads is always allowed and does not require locking. Writing to disjoint parts of the same array on multiple threads can also be done safely without locking. The case where you absolutely need a lock is if multiple threads may write to the same element of a mutable array.

1 Like

If you are running a program and block 'a' produces a single 3d array called BIG and block 'b' uses rayon to, in a multi-threaded loop, access and assign 3d array's from BIG using array.select(Axis(2), &[x,y,z]) is that thread-safe if the compiler doesn't object, or would i need a mutex/arc on BIG ?

To safely write to an object you are also accessing from elsewhere, you must make sure that the accesses are to different parts of the object, and to make it pass the compiler, you need to somehow convince it that you really are accessing different parts of the object.

E.g. with an array, you would need to use something like split_at_mut to produce non-overlapping sub-slices, at which point you can access each half in parallel. Just using indexes is not enough, because the compiler can't tell that the indexes are different.

1 Like

If you do go forward with your approach, wrapping your array (or any shared data) in an Arc/Mutex combination and working over it via rayon functions like map_with might be of aid to you.

1 Like

I am not writing to an object i am accessing elsewhere, i am only accessing simultaneously. So, for example:

thread_1 = BIG.select(Axis(2),&[1,3,5]);
thread_2 = BIG.select(Axis(2),&[4,0,6]);
thread_3 = BIG.select(Axis(2),&[1,7,30]);

BIG is shared, but the code is only accessing, not writing. All of this compiles just fine, but Is this still a problem regardless of its compile-ability?

If you only need immutable access, you do not need a mutex. An Arc alone will let you share it immutably.

4 Likes

I'd be interested on your take on this code, because even after adding an Arc, the code is 'Killed' by linux right after it goes into the rayon loop (results.axis_iter_mut(.......)


Just so you know, you don't allows need an Arc. If the old code compiles without an Arc then it is safe. Arc is only needed if the lifetime can't be statically known. I can't remember the last time I needed an Arc. Most of the time a Mutex or just an immutable reference is enough especially with a parallel iterator because the all the parallel processing is done in a single scope.

1 Like