Performance comparison: call c/c++ (unsafe function) and rust wrapped function

I asked about the performance of python vs rust before, but now I want to ask call c/c++ vs rust wrapped.
The problem is also around using cassandra lib ,cassandra-rs(rust wrapped) and cassandra-sys-rs(unsafe call c/c++).

I used two libraries to build the web application for the query, use cassandra-rs and use cassandra-sys-rs.
Basically the logical architecture of the two is the same, but after comparing with cargo run --release, use cassandra-sys-rs is much more efficient than use cassandra-rs.

The amount of query data is also one day, but the search time for cassandra-rs is 1.079947, while the use cassandra-sys-rs is 0.134054.

At first, I only used cassandra-sys-rs for development, but in order to use singleton, I used cassandra-rs to develop…so I asked about the performance. Why is it also based on c++ cassandra, but the performance is very different?

what about perf/callgrind, who is the top CPU user in cassandra-rs ?

1 Like

Most likely Rust wrappers have extra abstractions that introduce overhead (talk about zero cost abstractions :D), but to answer your question you’d need to use some profiler tools, preferably with optimizations.

It might just be a case where some #[inline] attributes are needed to collapse simple wrappers in cassandra-rs, but the profiler should tell the actual story.

lto should fix missing inline across crates. Function call itself is rarely a big impact, I doubt it is the main problem

1 Like

perf result:


The result looks pretty average…
And callgrind has been unable to display, still testing.

Because I haven’t heard of LTO before, I searched for a link to stackoverflow.
The way to use is to increase lto = true in Cargo.toml? Or is there any other way?

That perf output looks like a lot of short-lived allocations in the Rust layer, likely for managing data structures involving a BTreeMap and HashMap. If the Rust code is doing a bunch of book-keeping like this, it could explain the performance difference you’re seeing.

My program does use a lot of HashMap in order to assemble the query data. However, a HashMap is necessary, so I don’t know how to optimize it…

Yes, this is correct, and lto is enabled by default for release builds

As for your output, as @cbiffle mentioned using collections will affect your performance.
So is it used in your code or in code of cassandra wrapper?
If it is your code, one way to optimize it would be to not create these collections each time, but instead store it somewhere for re-use (clearing collections will not de-allocate memory unless you perform it manually) so you’ll start saving time on allocations

3 Likes

It could be that the cassandra-rs wrapper is contributing a lot of those allocations too. At a quick glance, I see a lot of owned strings and CString::new. It’s hard to avoid that though, because it has to make sure there’s a NUL (b'\0') terminator, which a Rust borrowed &str won’t usually have.

No, lto is not enabled by default for any profile:
https://doc.rust-lang.org/cargo/reference/manifest.html#the-profile-sections

Or in the source, Profile::default_release() only changes name and opt_level from the global Profile::default(), with lto: Lto::Bool(false).

2 Likes

No, lto is not enabled by default for any profile:

That’s actually sad.
I’m already tired of cargo team changing all the time.
Well, I guess it is good that I enable lto in my Cargo.toml

That’s not a change; it’s never been on by default, because it significantly increases compile time and doesn’t always make things faster.

3 Likes

That’s so? Ok, my bad, but for some reason I remember it being default at some point :thinking:

IIRC, we have two kinds of LTO, “thin” and “full”. The “full” one is expensive and was never enabled by default, but I think the “thin” one might have been enabled at some point as it is much lighter-weight (though less capable).