Performance comparison: call c/c++ (unsafe function) and rust wrapped function

pili2026 · June 11, 2019, 10:41am

I asked about the performance of python vs rust before, but now I want to ask call c/c++ vs rust wrapped.
The problem is also around using cassandra lib ,cassandra-rs(rust wrapped) and cassandra-sys-rs(unsafe call c/c++).

I used two libraries to build the web application for the query, use cassandra-rs and use cassandra-sys-rs.
Basically the logical architecture of the two is the same, but after comparing with cargo run --release, use cassandra-sys-rs is much more efficient than use cassandra-rs.

The amount of query data is also one day, but the search time for cassandra-rs is 1.079947, while the use cassandra-sys-rs is 0.134054.

At first, I only used cassandra-sys-rs for development, but in order to use singleton, I used cassandra-rs to develop....so I asked about the performance. Why is it also based on c++ cassandra, but the performance is very different?

Dushistov · June 11, 2019, 1:05pm

what about perf/callgrind, who is the top CPU user in cassandra-rs ?

DoumanAsh · June 11, 2019, 2:50pm

Most likely Rust wrappers have extra abstractions that introduce overhead (talk about zero cost abstractions :D), but to answer your question you'd need to use some profiler tools, preferably with optimizations.

cuviper · June 11, 2019, 6:03pm

It might just be a case where some #[inline] attributes are needed to collapse simple wrappers in cassandra-rs, but the profiler should tell the actual story.

DoumanAsh · June 11, 2019, 7:18pm

lto should fix missing inline across crates. Function call itself is rarely a big impact, I doubt it is the main problem

pili2026 · June 12, 2019, 6:19am

perf result:

The result looks pretty average...
And callgrind has been unable to display, still testing.

pili2026 · June 12, 2019, 6:27am

Because I haven't heard of LTO before, I searched for a link to stackoverflow.
The way to use is to increase lto = true in Cargo.toml? Or is there any other way?

cbiffle · June 12, 2019, 6:33am

That perf output looks like a lot of short-lived allocations in the Rust layer, likely for managing data structures involving a BTreeMap and HashMap. If the Rust code is doing a bunch of book-keeping like this, it could explain the performance difference you're seeing.

pili2026 · June 12, 2019, 7:25am

My program does use a lot of HashMap in order to assemble the query data. However, a HashMap is necessary, so I don't know how to optimize it..

DoumanAsh · June 12, 2019, 7:48am

Yes, this is correct, and lto is enabled by default for release builds

As for your output, as @cbiffle mentioned using collections will affect your performance.
So is it used in your code or in code of cassandra wrapper?
If it is your code, one way to optimize it would be to not create these collections each time, but instead store it somewhere for re-use (clearing collections will not de-allocate memory unless you perform it manually) so you'll start saving time on allocations

cuviper · June 12, 2019, 4:30pm

It could be that the cassandra-rs wrapper is contributing a lot of those allocations too. At a quick glance, I see a lot of owned strings and CString::new. It's hard to avoid that though, because it has to make sure there's a NUL (b'\0') terminator, which a Rust borrowed &str won't usually have.

No, lto is not enabled by default for any profile:
https://doc.rust-lang.org/cargo/reference/manifest.html#the-profile-sections

Or in the source, Profile::default_release() only changes name and opt_level from the global Profile::default(), with lto: Lto::Bool(false).

DoumanAsh · June 12, 2019, 7:18pm

No, lto is not enabled by default for any profile:

That's actually sad.
I'm already tired of cargo team changing all the time.
Well, I guess it is good that I enable lto in my Cargo.toml

steveklabnik · June 12, 2019, 8:57pm

That’s not a change; it’s never been on by default, because it significantly increases compile time and doesn’t always make things faster.

DoumanAsh · June 13, 2019, 4:16am

That's so? Ok, my bad, but for some reason I remember it being default at some point

HadrienG · June 13, 2019, 6:22am

IIRC, we have two kinds of LTO, "thin" and "full". The "full" one is expensive and was never enabled by default, but I think the "thin" one might have been enabled at some point as it is much lighter-weight (though less capable).

system · September 11, 2019, 6:31am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why Python performs better than rust (call cassandra) help	19	2003	August 13, 2019
In my benchmark, I found Rust slower than C	6	2013	May 19, 2022
Are there any compilation time benchmarks of Rust vs. G++ vs. Clang++?	24	14785	July 3, 2022
Need help with profiling/improving performance of my LD_PRELOAD library help	10	683	April 20, 2021
Benchmark to compare performance of Rust with C/C++? help	11	15811	April 5, 2020

Performance comparison: call c/c++ (unsafe function) and rust wrapped function

Related topics