Seeking the best performance approach to serializing vectors

Hi everyone, I'm using the Python's numpy module to do some scientific calculations. One problem that bothers me is that I need to do a lot of serialization and deserialization, and the native functions provided by numpy are too slow. Since there is a c-api for numpy, I wonder if it is possible to import the data into an Rust's ndarray and have the serialization done by rust.

Is there some serialization method provided by ndarray? Or another way is that we can convert those data into a 2d vector (which stores all f32 types internally) and then directly export the associated heap memory directly and import it when we need to use it?

I need to read/write millions of data in milliseconds and unfortunately they are all stored decentralized so I can't call the batch read method provided natively by the framework. Any help would be greatly appreciated!!

Maybe you can find it at crates.io , but serde is recommended for serialization and deserialization.

[dependencies]
serde = "1.0.137"

If you want to do some calculations of ndarrray, you can try it

[dependencies]
ndarray = "0.15.0"

example: ndarray examples

1 Like

@steffahn @alice Maybe Frank and Alice has a better way

You can define the following function:

fn f32_to_u8_array(arr: &[f32]) -> &[u8] {
    let len = std::mem::size_of::<f32>() * arr.len();
    let ptr = arr.as_ptr();
    unsafe {
        std::slice::from_raw_parts(ptr.cast(), len)
    }
}

Then you can simply write the resulting &[u8] slice to a file. To go in the other direction, you can create an Vec<f32>, then use the following function to get an &mut [u8] that you can write the data to.

fn f32_to_u8_array_mut(arr: &mut [f32]) -> &mut [u8] {
    let len = std::mem::size_of::<f32>() * arr.len();
    let ptr = arr.as_mut_ptr();
    unsafe {
        std::slice::from_raw_parts_mut(ptr.cast(), len)
    }
}

(These functions are carefully designed to only convert &[f32] -> &[u8] and not the other way. The other direction is not as simple because u8 has a lower alignment than f32.)

4 Likes

Thanks! This looks like exactly what I need, one more question is that it looks like due to memory organization 2d vectors must be imported into 1d vectors first, guess we can't avoid these memory operations?

Usually 2d vectors are stored as a single long 1d vector with length n * m, so you should be able to pass the backing memory into my functions. What format do you have the arrays in?

1 Like

Try using bytemuck to avoid the unsafe - it will panic if you're misaligned, fail to compile if you incorrectly assume a layout, eg [[u8; 256]] vs [&[u8; 256]].

7 Likes

My data is usually 2d ndarray arrays consisting of f64 or u32, which can also be considered as Vec<Vec> since they can also be easily converted to vectors. Sorry for my lack of knowledge related to rust memory structure arrangement, it seems to me that 2d vectors are not stored contiguously in heap memory, which makes them difficult to be exported before conversion to one dimension.

In the ndarray crate, the ArrayBase type has as_slice and as_slice_mut methods that give you a single array with all of the data. You can pass it to my methods.

2 Likes

Not sure what do you mean here. They can be easily converted to one-dimensional vectors via Array::into_raw_vec, but I can't find any method which easily converts them to nested vectors.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.