Is there a performance hit for using a u32 on an x64 chip?

I'm definitely showing my systems programming ignorance here with this question. What I'm wondering is if there is a performance hit when using a u32 (or any x32 int, etc) when the chip is x86_64? It would seem like the OS or chip would have to do additional work.

For x86_64 there almost certainly is not a performance hit. If you have multiple u32s packed in an array, slice or struct, there many be a performance improvement due to better cache locality (since twice as many u32s fit in each cache line as u64s).

2 Likes

Does this caching come for free, or is it something you have to explicitly ask for?

The caching is automatically provided by the CPU chipset, and may not work if you bough your CPU before 1990. But even without CPU cache, loading millions of u64s from memory would takes double amount of time than loading millions of u32s.

2 Likes

You may also gain a performance benefit from vectorising and SIMD.

For example, SSE gives you 128-bit registers for SIMD operations... This means the processor could load 4x u32s into a register and execute an operation on all 4 integers in one instruction. In comparison if you used u64 it could only apply the operation to 2 integers at a time. Later SIMD instruction sets (SSE2, SSE3, AVX-512, etc.) can work with larger numbers of values.

The compiler will automatically apply these sorts of optimisations when it believes they're valid and would improve performance.

Certainly I have seen significant slow downs when operating on large arrays of 64 bit data. Most likely attributable to cache pressure.

1 Like

Thank y'all for all the information. It's really good stuff.

1 Like

As usual, don't forget to measure the performance of whatever you are doing yourself. Better not to take anyone's word for it.

Perhaps there is a performance hit. Perhaps in your application the impact of that is negligible and taking the hit is more convenient.

I'm sure we could make micro-benchmarks to demonstrate that either is faster than the other in different circumstances.

There is a performance hit when u32 is used for array indexing inside loops. That's because overflow behavior of u32 is different than overflow of usize, and the compiler has to pointlessly emulate that.

3 Likes

Could you give an example for that? Some small toy examples I wrote don't support that claim?

4 Likes

Thanks, that sounds like an interesting read :slight_smile:

Half a year ago some guys and I were having fun pitting C against Rust in creating solutions to the Project Euler problem number 256 : Tatami-Free Rooms problem : https://projecteuler.net/problem=256

I ended up with a Rust solution that matched the performance of C. It could be built to use 32 or 64 bit arithmetic.

32 bit math was faster on smaller problems. 64 bit bit was required for bigger problems.

My Rust code and the 32 bit and 64 bit solutions are here https://github.com/ZiCog/tatami-rust if you want to check it out and play with it.

Along with instructions as to how to build the same Rust code so that it uses either 32 or 64 bit math. Basically I generate code containing types and constants form a build.rs file. If anyone knows a better way to do that it would be great to hear.

This is not the first time I have found 32 bit math to be faster.

1 Like

For the most part, the compiler's actually free to upgrade your integer types to whatever's best, at least for local usage.

When you tell rustc to use u32, it will almost certainly be a 32 bit integer when stored in (observable) memory. But when it's loaded into registers to actually do work with, the optimizer can pick any way of doing the arithmetic that gives the same result.

If adding two u32, the compiler is free to choose between a 32 bit add (that preserves the upper 32 bits of the 64 bits of scratch space) or a 64 bit add (which doesn't) with a 32 bit store (which discards the upper 32 bits), depending on which it expects will be faster.

Compilers are really good at optimizing your code, especially when it's mostly math. They're almost certainly better at microoptimising your code than you are. Write the code that you want to read and expresses intent first, then you can adjust it if profiling shows that it's necessary.

As a general rule, though, smaller numbers are typically easier to work with because the processor is capable of doing more with its limited memory budget.

If you're curious, here's a good video overview of what awesome things compilers do every compile:

2 Likes

The only trouble I have with using the most appropriate integer type for "mostly maths" is when a result of the mathematics is going to be used as indexes into an array at some point. For example using u16 as an index requires explicit casts. So I can end up using usize everywhere even if that doesn't best express the intent, simply because it's more readable overall.

I would use the compact storage representation, together with a short macro to do the usize conversion at the point of indexing. See my earlier post on this.

2 Likes

Just as a reminder, the Euler folks respectfully request that solutions to problems > 100 should not be posted online.

2 Likes

That's interesting. I'm guessing most people would store their solutions in version control, so are they referring to anywhere online (e.g. GitHub) or just public forums?

Their wording is very broad: Solutions are not to be posted online where they can be found by others. Compliance is generally excellent. It's a real badge of honor to say that you've solved all the Euler problems, and obviously that wouldn't be the case if you could just Google the answers.

Ah, oops, Had I read that before I would have respected it their wishes. But I had not read that. In fact I have not read any of the Project Euler pages much. This was just a piece of code that was under discussion for many weeks on another forum https://www.raspberrypi.org/forums/viewtopic.php?t=240287&p=1563671 where people were comparing implementations in various programming languages.

I don't claim any badge of honor for that solution. It was just piece of code that looked interesting to recast from C to Rust as an exercise in becoming familiar with Rust. And to demonstrate to myself that Rust would perform as well as C.

To be honest I don't fully understand how it works.

What to do? I guess I should take down my repo there. But the code is all over the place in various forms now.

Or in the spirit of Euler I might just say "meh, here is a solution, without any rigorous proof".