Why is this Rust code slower than C#?

Personally this bugs me:

struct Person {
    occupation: String,
    id: isize
}

Apparently persons are stored efficiently as just an id, they have no names as Strings, for example. Which is very odd. How do we know who they are?

But then their occupations, all 40 million of them, get stored as strings.

I'd be thinking of having a vector of "Occupation" structs that contained a string name for the occupation and an id to refer to it with. Or likely a hashmap.

Then a Person struct would contain the occupation id instead of an occupation string.

I think the database guys call this "normalization".

2 Likes

Again - this isn't meant to be an actual program, just something to get a point across.
In reality, there is more data to Person, and the occupation string (as well as the rest of the struct) is input from a file and normalizing it would take unnecessary time - the occupation strings are only used in the group_by_occupation function.

I appreciate that. It's OK by me.

But still I wonder. The original claim, the point, was that the C# was twice as fast. But so far I have always found it to be slower. Except when using the original code with the full 40 million of everything, that is written so badly that both Rust and C# hangs my machine for ages or crashes out.

Just curious, have you benchmarked that? :joy: Presumably allocating millions of small strings takes non-negligible time.

1 Like

I tweaked the timing code a bit:

    let now = Instant::now();
    let result1 = group_by_occupation_using_refcell(&people);
    println!("Time for RefCell implementation: {}", now.elapsed().as_secs_f64());
    drop(result1);
    println!("            Including drop time: {}", now.elapsed().as_secs_f64());

    let now = Instant::now();
    let result2 = group_by_occupation_using_indices(&people);
    println!("Time for index implementation: {}", now.elapsed().as_secs_f64());
    drop(result2);
    println!("          Including drop time: {}", now.elapsed().as_secs_f64());

For me that gives

Time for RefCell implementation: 43.8881539
            Including drop time: 57.8763071
Time for index implementation: 19.5661466
          Including drop time: 20.8624543

I think that does a good job of emphasizing that this is basically just a memory & (de)allocation benchmark -- look at the massive difference in drop time -- especially since it also hit a peak working set of over 9GB.

Similarly, here's another small tweak:

    let mut occupations: Vec<OccupationGroup> = Vec::with_capacity(people.len());

    // map to index instead of ref
    let mut occupation_map: HashMap<&str, usize> = HashMap::with_capacity(people.len());

That small change speeds things up by about ⅓:

Time for index implementation: 12.7122766
          Including drop time: 13.9051792

So it's not obvious to me that any numbers from this code are particularly meaningful for extrapolation.

I'm particularly confused by your result that index is slower than RefCell, since index wins handily every time for me. (1.44.0 Windows, FWIW.) How much RAM do you have?

3 Likes

Worth taking a look at, if these strings could be replaced with string slices from the input, there could be a dramatic performance difference.

1 Like

The real test case with the real dataset had C# running at ~200 seconds while Rust took around ~300.
Not exactly twice as fast, but definitely slower.
This example conveys the same spirit I had there, and I definitely have stuff I can try out now.

That was because I didn't drop the previous result - already addressed.

Two remarks on Rust style / idioms that I didn’t see anyone address or fix here:

  • Don’t use &Vec<...> but use &[...] instead. The former coerces into the latter anyways so it’s more general, and also the latter has one less indirection.
  • Consider writing elided lifetime parameters explicitly, i.e. OccupationGroup<'_> instead of just OccupationGroup. That way it is clear to the reader of the code that the type contains references and what lifetimes are involved.

Together, these would change the declaration of group_by_occumpation to look like this:

fn group_by_occumpation(people: &[Person]) -> Vec<OccupationGroup<'_>> {
    /* ... */
}
2 Likes

Maybe you can have a speed up by replacing this line:

        occupations[occupation_index].people.push(person);

by

        unsafe {
           occupations.get_unchecked_mut(occupation_index).people.push(person);
        }

It avoids the bound checking on every iteration. Using "get_unchecked_mut" is safe in this case, because "occupation_index" can not go out of bounds by construction.

1 Like

I like to think we should be very hesitant to suggest the use of "unsafe" to anyone to boost performance of their code.

In my experience so far, only one year I'm still a Rust newbie, the use of "unsafe" has not been needed to even match the speed of C in intensive computations over arrays. Even though it was suggested by some on the road to a good solution during discussions here.

In this particular example, with a huge amount of memory allocations and string parsing, etc, going on I cannot imagine removing a few array bounds checks with the use of "unsafe" would even show up in the noise.

Have you tried it?

13 Likes

I tend to suggest the usage of unsafe, if I have a safe variant in mind l, but it'd result in rewriting the code in a more extensive way, basically as a proof of concept if rewriting is feasible.

Sorry, I can't make any sense of what you have written there.

Why would suggest the use of "unsafe"? Especially in the case of the problem in this thread which is just a regular application software level problem.

If I have a safe alternative in mind and I know that unsafe would not be UB, but getting the safe version with the same performance characteristics to run is a bit more complicated, I'd just locally test if the performance is improved by the unsafe version to determine if rewriting to the safe, but more complex, solution is worth it or not, because I know the unsafe version can't be bested.

Running exactly the code from the OP still gives me index as being faster:

Time for RefCell implementation: 44.3789481
Time for index implementation: 26.7187189

So when my run for the code has index being 0.507 Np faster but your numbers (in the OP) have it being 0.154 Np slower, there's something else going on here.

Hmm..

My experience with optimizing Rust codes to meet the performance of C, after lengthy discussions on this forum, is that the proposed "unsafe" solutions were slower than the final safe solution.

So I don't accept idea that "the unsafe version can't be bested."

Perhaps you do.

Again, given the discussions here I have read over the last year, it seems to be very hard to tell what is UB. At least for those that don't have a lot of time to study the problems.

In short, I think it is unnecessary and even dangerous to suggest "unsafe" until it is proven there is no other way to do it.

1 Like

The OP started this thread with

Surely that's enough information to avoid advising use of unsafe, with which even many multi-year--experienced Rustaceans occasionally create UB. Even the original authors of std did not escape; the original 1.0 version of RWlock() had UB.

7 Likes

While I agree, and probably won't use unsafe in real-world applications, I think it is interesting to see the performance benefit.

Yes of course.

But when you can boost the performance of what you have by a factor of 10 or so. Without using "unsafe" Which I am sure is possible. Then I would be inclined to put thoughts of "unsafe" on the back burner.

Save that for something that really needs it. If you ever need it.

But do realize that any measured benefit is valid only if you manage to avoid UB. Using unsafe for code optimization is a very advanced technique that should be attempted only long after you've reached the state where you find Rust's lifetimes easy and no longer find yourself "fighting the borrow checker".

unsafe does not give you freedom to write code that violates the LLVM optimizer's constraints; it only gives you the power to augment the correctness-checking of the Rust compiler's front-end with your own superior understanding of complex code.

Given that you are a beginner at Rust, no matter what your experience in other languages, any use of unsafe for code optimization is almost certainly a case of "premature optimization". If you look through @ZiCog's many posts over the last year, you'll find quite a few where he documented that the Rust compiler could better optimize code than could C and C++, all without using unsafe. IMO, at this stage in your Rust learning the likelihood that your experience will be different is very small.

3 Likes