Question about the efficiency of vec![] on num_complex::Complex

I find that vec![Complex::::new(0,0); n] is much inefficient than vec![0_i16; n*2] (by 50%).

As num_complex::Complex is identical to [T ; 2] in memory. I’m curious about the reason.

Currently, I have to do some hack: I just initialize a vec![0_i16; n*2] then convert it to ptr, then use Vec::from_raw_parts to do the conversion.

Is there any formal method to do this, that can gain a better performance (same as vec![0_16; n*2] method)?

Did you build in Release or Debug mode? My guess would be that in Debug mode the “new” call is not in-lined so there is n function calls and associated overhead to populate all the default values. Also, you could try creating a “Default” static value for Complex and use that as the initializer value. That should get performance close to the n*2 thing you did I would imagine.

EDIT: Is “Complex” a Copy type? It it isn’t, then it will be cloned “n” times. Depending on the Clone impl, that could be expensive. Check if Comples is Copy. It it isn’t, it probably should be.

I built in release mode and Complex impls Copy.

I list my test code as follows:

fn main() {
    let nch=2048-512;
    let nchunk=80000;
    let buf_size=nchunk*nch;

    let buff= if true //set to be true will be much faster
    {
        let mut temp_buf=vec![1_i16; buf_size*2];
        let ptr=temp_buf.as_mut_ptr();
        std::mem::forget(temp_buf);
        unsafe{Vec::from_raw_parts(ptr as *mut Complex<i16>, buf_size, buf_size)}
    }
    else{
        vec![Complex::<i16>::new(0,0); buf_size]
    };
}
1 Like

This seems like a question for the “internals” (https://internals.rust-lang.org/latest) forum. This does seem like a pickle. I agree with you that there is no good reason for the latter to be twice as slow. Seems like some optimization is being prevented from occurring based on the way that Complex is implemented.

1 Like

I noticed that your i16 code was initializing to a different value than the complex code. Out of curiosity, I made it the same, and then the fast code got about five times faster. Presumably when filling with primitive zeros the compiler recognizes that and does some clever optimization such as calling bzero(3). Too bad it doesn’t seem able to make the same leap with a struct that is isomorphic to 0_u32. The Complex::new function isn’t doing anything tricky that I can see.

about the different values between i16 and complex code, yes, you are right, it was a typo.

Actually for me, the running time is 0.08s for i16 method and about 0.3 s for the Complex::new method, about 3-4 times, agrees what you observed.

I went ahead and started an internals thread for this: https://internals.rust-lang.org/t/vec-lack-of-optimization-for-zeroed-types/8303

Looking at the assembly generated for several cases:

For the case of 0_i16, the optimizer recognizes that there is only zeros written and calls __rust_alloc_zeroed, with no further actions. If it’s any other integer, there will be a loop which is much slower (but still faster than with Complex).

Using Complex, the loop is not inlined but the code calls Vec::extend_with.

2 Likes

I wonder if there could be a marker trait that indicates that it is safe to use all zeroes for the size of the type as default values. Then, vec! could use the fact that the type impls that trait to optimize and use __rust_alloc_zeroed? For example, something along the lines of:

unsafe trait Zeroable;
unsafe impl trait Zeroable for Foo;
let v = vec::<Foo>::new( 100000 ); // would allocate a vector of Foo's 1000,000 in length, all zeroed using __rust_alloc_zeroed and so would be basically instantaneous

This trait could be auto-derived based on the “zeroableness” of components of the struct (similar to how send and sync are auto-derived). A type could also be declared !Zeroable to override the auto-derive. Also could allow manually/explicitly implementing Zeroable impl if it didn’t auto-derive but was safe to be zero-able.

EDIT: So apparently, this is already a thing, it just hasn’t stabilized yet: https://internals.rust-lang.org/t/vec-lack-of-optimization-for-zeroed-types/8303/3?u=gbutler