Ndarray-parallel using Rayon killed no traceback. Help

i have a 3d ndarray that is 6,500,000 x 2 x 108 and the process seemingly dies when trying to select 5 different 2d arrays along Axis 2. This is being run in parallel with 8 cores, and it does not happen on datasets that are much smaller, but this shouldn't be that hard in the scheme of things meaning 6.5m is alot but not THAT crazy. The error when it dies is just 'Killed'. Any ideas? Thanks!

What does your array store? How mich memory do you have available on the system?
Assuming you store something that requires 8 bytes your array already is around 10GB of data.

I'm not too familiar with ndarray itself, but viewing (or copying) 5 2D subarrays with dimensions 2 x 108 should not require a lot of memory (about 216 times the memory footprint of a single element).

Can you share the problematic code or a minimal version that behaves as you said?

Also, if running in parallel, do you create the array multiple times? Depending on the parallelizatization you might end up allocating it many more times than you think.

2 Likes

That’s what you’ll see if Linux’s out-of-memory killer terminated the process. If it did, I think there should be an entry in /var/log/messages about it.

2 Likes

Try putting an internal limit on amount of memory Rust can use. This way if you hit the limit, you'll get abort from Rust itself rather than KILL from the OS.

What's the ndarray operation?

If I understand you correctly, each "job" will try to copy (? because you said select) out data of approximately 6,500,000 x 2 x 5 elements. First of all, it would be best to not use any copying operations at all, and work on the data in place. Rayon parallelism can sometimes start a lot more jobs than there are available threads - which means that there could be upwards 100s of jobs running in parallel, and I could see that pressing memory usage. Try to limit the fanout of the rayon operation to just one per cpu.