Sound conversion from `Vec<[num_complex::Complex64; 4]>` to `ndarray::Array2<num_complex::Complex64>` without copying

Hi! The title is a bit gnarly, so I'll make things simpler here.

I have a struct Jones([num_complex::Complex64; 4]) and a function that returns Vec<Jones>. I'd like to pass this to a Python caller via pyo3 as a numpy::PyArray2<numpy::c64>, where the second dimension is always length 4 (in other words, each element of the [Complex64; 4] becomes a column in a 2D array). numpy::c64 can be treated as num_complex::Complex64 (if they're already not the same), which in memory is essentially just two f64s next to each other.

To my knowledge, the easiest and cheapest way to get there safely is via ndarray with the following:

// `jones` has type `Vec<Jones>`
let jones: Array2<numpy::c64> = ndarray::Array1::from_iter(jones.into_iter().flat_map(|j| {
            [
                numpy::c64::from(j[0]),
                numpy::c64::from(j[1]),
                numpy::c64::from(j[2]),
                numpy::c64::from(j[3]),
            ]
        }))
        .into_shape((az_rad.len(), 4))
        .unwrap();
        jones.into_pyarray(py) // `py` is one of the function args and isn't important.

Now, I'd like to check that this is a zero-cost abstraction. I've tried using cargo asm, but... I can't read the assembly. It is likely, though, that some panic code is emitted for the unwrap, and I would be OK with that if the rest of the code doesn't copy.

An unsafe way to do the same thing is:

        let jones: Array2<numpy::c64> = unsafe {
            let ptr = jones.as_mut_ptr() as *mut numpy::c64;
            let len = jones.len();
            let cap = jones.capacity();
            std::mem::forget(jones);
            let v = Vec::from_raw_parts(ptr, len * 4, cap * 4);
            ndarray::Array2::from_shape_vec((v.len() / 4, 4), v).unwrap_unchecked()
        };
        Ok(jones.into_pyarray(py))

This is essentially what'd I'd do in C (if I was writing in C), but I'm not actually sure how sound this is. Can I assume that v has a length and capacity 4x the original jones vector? And, does this code avoid a useless copy that the above safe Rust code might be doing?

Thanks in advance.

First of all, if you have a newtype wrapper, you must mark it as #[repr(transparent)] in order to perform any sort of transmuting/reinterpretation, otherwise the layout is the default (and unspecified) #[repr(Rust)], transmuting of which is usually UB.

Next, arrays and slices have the same alignment requirements as their elements, therefore so does the underlying (implicit) slice contained in a Vec. Therefore, if you have a Vec<[T; N]> or a &[[T; N]], you are allowed to convert it to a Vec<T> of length N times the original length.

However, your code contains another soundness issue and is not completely idiomatic.

First, the soundness issue: as_mut_ptr() takes a &mut self and the returned pointer has the same provenance as the mutable reference to self itself. Therefore, accessing the pointed vector through anything but the returned raw pointer is UB due to the exclusive mutability requirement. So you should swap the order of querying the length and the capacity and the conversion to a raw pointer.

Second, you use mem::forget(), which is dangerous, because it performs the forgetting after you are done with the conversion. This is problematic, because if anything during the conversion panics, the vector won't be forgotten and it will be dropped. If something already assumed ownership of the data, this will lead to a double-drop. You should be using ManuallyDrop instead.

All in all, a better and simpler way to perform this conversion unsafely would be like this:

use std::mem::ManuallyDrop;
use ndarray::Array2;

type C64 = num_complex::Complex<f64>;
const N: usize = 4;

#[repr(transparent)]
struct Jones([C64; N]);

fn main() {
    let jones: Vec<Jones> = Vec::new();
    let mut jones = ManuallyDrop::new(jones);
    let old_len = jones.len();
    let new_len = old_len * N;
    let new_cap = jones.capacity() * N;
    let new_ptr = jones.as_mut_ptr() as *mut C64;
    // SAFETY: new_cap * N == old_cap, align_of::<C64>() == align_of::<Jones>()
    let flat = unsafe { Vec::from_raw_parts(new_ptr, new_len, new_cap) };
    let new_jones = Array2::from_shape_vec((old_len, N), flat).unwrap();
}
3 Likes

In the past, the from_raw_parts method would require that the type has both the exact same size and alignment, which does not hold in your code, since the size of the two types are not the same. However, recently, this requirement was relaxed to the following:

  • T needs to have the same alignment as what ptr was allocated with. (T having a less strict alignment is not sufficient, the alignment really needs to be equal to satisfy the dealloc requirement that memory must be allocated and deallocated with the same layout.)
  • The size of T times the capacity (ie. the allocated size in bytes) needs to be the same size as the pointer was allocated with. (Because similar to alignment, dealloc must be called with the same layout size.)

So the new requirement is that the allocation size in bytes must remain the same. Your multiplication by four ensures that this is true.

1 Like

Thank you both for the input! It sounds like I can make the unsafe code work with some tweaks. This is great for avoiding a useless copy, but I'm still curious about the safe (i.e. not unsafe) code; does this copy, and how can I tell if it does more generally?

There is not a definitive, portable, robust, and future-proof answer to "does it copy", because it's an optimization, and it might change in the future and across platforms/compiler settings. The Rust compiler, hand-in-hand with LLVM, is surprisingly good at re-using allocations and eliding buffer copies, so you can expect the copy to be optimized away, but you can't rely on it for correctness.

You either look for a loop that reads from one location and writes to another, or for a call to memcpy()/memmove().

1 Like

Thanks once again. I'll mark your original post as the solution.

You can also do this without unsafe code if you use the bytemuck crate:

// Cargo.toml:
// 
// [dependencies]
// bytemuck = { version = "*", features = ["derive", "extern_crate_alloc"] }
// ndarray = "*"
// num-complex = {version = "*", features = ["bytemuck"] }

use bytemuck::NoUninit;
use ndarray::Array2;
use num_complex::Complex64;

#[derive(Clone, Copy, NoUninit)]
#[repr(transparent)]
pub struct Jones([Complex64; 4]);

pub fn jones_to_ndarray(jones: Vec<Jones>) -> Array2<Complex64> {
	let len = jones.len();
	let vec = bytemuck::cast_vec(jones);
	ndarray::Array2::from_shape_vec((len, 4), vec).unwrap()
}

The only downside is that you need Jones to implement Copy. I think in the future, when project-safe-transmute lands, this won't be a requirement anymore. For now if you really need to avoid it you can first convert a Vec<Jones> to a Vec<[Complex64; 4]> by using into_iter + map + collect. i.e. remove the derives and replace the let vec = bytemuck::cast_vec(jones) with:

let vec = jones.into_iter().map(|jone| jone.0).collect();
let vec = bytemuck::cast_vec(vec);

This won't reallocate thanks to optimizations in the stdlib, but there's the probability of LLVM not optimizing out an empty loop.

3 Likes

Hey, this is great! Thanks for pointing it out -- I did try to use bytemuck before making posting this post but I must've been missing the feature for the num-complex package. I can also see in a toy crate that the cargo asm of your bytemuck code matches that of H2CO3's, without using unsafe!

Fortunately for me, I've made Jones Copy (not really sure when/when not to make something Copy, but it's at least made coding easier). It's actually also generic over f32 and f64, but I need to add #[repr(transparent)]. I regret not posting a question like this sooner; I learn so much!

You make a type Copy when it doesn't matter how many copies of values of that type are around. Typically, this means "plain old data" (such as numbers, bytes, arrays of numbers, simple wrapper structs around them, etc.)

Sometimes, a type can technically be Copy (i.e. that would be allowed by the compiler), but it would be a logic error to copy it. For example, types containing owning raw pointers fall into this category, and so do types that are passed around in order to mark something as having unique ownership. Another example is when copying can be confusing due to mutable state; the standard Range type, which is an Iterator, doesn't implement Copy for this exact reason.

3 Likes

Thanks, I was aware of most of this (although additional perspective on Range is interesting), but I guess I was mostly wondering about when to make something copy with regards to a type's size. In my case, Jones<f64> (i.e. a newtype of [Complex64; 4]) is 64 bytes. Naively I'd guess that it's on the performance boundary of what's appropriate to copy freely vs. what's better as a pointer dereference. I guess benchmarking is the only really to obtain an answer but I've got other low-hanging fruit to deal with. For additional context I routinely allocate many gigabytes of these in my code.

(Feel free to ignore the "question" if you have better things to do!)

Copies will be elided by the compiler if they can be. If they can't, then you won't fix that by not making a type Copy. Write code to reflect semantics and intention first instead of worrying about trivial copies.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.