Performance comparison between kvec and C

rusty_ron · February 18, 2018, 5:50pm

I had a go at testing some raw allocation performance on an i5-7300 against kvec library https://github.com/attractivechaos/klib

Got some interesting results

C array, preallocated: 0.400 sec
C array, dynamic: 1.073 sec
C vector, dynamic (kv_push): 1.077 sec
C vector, dynamic (kv_a): 0.505 sec
C++ vector, preallocated: 0.402 sec
C++ vector, dynamic: 2.070 sec

Rust jemalloc
dynamic : 1.23s
pre-allocated: 0.631s

Rust system allocator
dynamic : 1.27s
pre-allocated: 0.585

Some initial conclusions
Preallocation is the way to go, no matter what the language; but even with dynamic arrays, Rust comfortably outperforms C++ and gets within 15% of C/kvec library.
The system allocator is possibly good for one off allocations where there's not a lot of reallocation going on.

toothbrush · February 19, 2018, 10:46pm

This might be a daft question, but did you run the Rust, C and C++ code with any optimisations enabled?

comex · February 20, 2018, 12:59am

Can you post the test code? It's surprising that Rust is slower than klib
here.

rusty_ron · February 20, 2018, 7:47am

Sure. The C/C++ code is the one from the authors blog https://attractivechaos.wordpress.com/2008/09/19/c-array-vs-c-vector built in VS2015 release mode.

The rust version, is a trivial loop, optimised and bench'ed

 #[bench]
fn preallocated(b: &mut Bencher) {
    b.iter(|| {
        for _j in 0..10 {
            let mut v = Vec::with_capacity(20000000);
            for i in 0..20000000 {
                v.push(i);
            }
        }
    });
}

#[bench]
fn dynamic(b: &mut Bencher) {
    b.iter(|| {
        for _j in 0..10 {
            let mut v = Vec::new();
            for i in 0..20000000 {
                v.push(i);
            }
        }
    });
}

I had a quick look on godbolt at the assembly. Rust doesn't unroll the loops,(-O3 -target-cpu=native) whereas MSVC does 4x unroll. Would be curious to know if there's an option to unroll the loop to match?

leonardo · February 20, 2018, 7:54am

Unrolling annotations are still missing. In the meantime you can add a compilation switch like:

-C llvm-args='-unroll-threshold=1000'

rusty_ron · February 20, 2018, 9:02pm

Oddly this didn;t make a difference to the generated code on 1.24. Maybe someone else has different result on godbolt?

vitalyd · February 20, 2018, 9:17pm

I know this isn't exactly what you're testing, but if you want to see some cool assembly, try this instead:

#[bench]
fn dynamic(b: &mut Bencher) {
    b.iter(|| {
        for _j in 0..10 {
            let mut v = Vec::new();
            v.extend(0..20000000);
        }
    });
}

rusty_ron · February 21, 2018, 11:15am

I have to admit that was quite cool. I wonder why in this case it utilises the larger ymm registers, and unrolls, but in the simpler original case it doesn't do either. Is this a possible area of optimisation in the compiler?

vitalyd · February 21, 2018, 1:16pm

It likely uses the vector registers because the loop is unrolled. Like inlining for functions, unrolling is the "gateway" optimization for loops.

extend() has a specialization for a TrustedLen iterator - it ensures the vec has just enough capacity to store the entire iterator, and then has a simple raw ptr based loop that copies elements into it.

The push based loop, on the other hand, ends up with a lot more code in the loop body, including at least one guaranteed-opaque function: RawVec::double. This function is explicitly marked #[inline(never)]. Compilers will typically not optimize around these because they can no longer reason about their effects.

Topic		Replies	Views
Why isn't Rust even faster than C++? Comparing heap vs stack allocation	26	4947	August 11, 2020
Trying to bench mark allocation cost help	2	552	November 6, 2020
Rust program has only 42% the speed of similar c++ program help	26	4989	July 28, 2022
Rust vs C++ Theoretical Performance	8	17565	July 3, 2022
Performance of array access vs C	32	4547	August 30, 2020

Performance comparison between kvec and C

Related Topics