Why would someone use `usize` over `u32`?

Why would someone use usize over u32 (or u64)? In other words, if I'm on a 32 bit architecture, is there a performance improvement for me to use usize instead of u32? Because if not, it feels like defining a fixed size variable like u32 (or u64) is objectively better than defining a "mystery-sized" variable? i.e. my code could be working fine on a 64 bit architecture, I switch over to a 32 bit machine and all the sudden I'm getting weird number wrappings, panics, or overflows. Is there such performance improvement?

2 Likes

the types serves different purpose, its not that one is superior than the other. you should choose what type suits your use case.

again, they are supposed to be used for different purpose, if you mean if there's runtime overhead for casting between the two, no, the cast should be a no-op and optimized away in the generated machine code

the types with fixed size are used to store numeric values, you choose the correct size for the specific domain problem. for example, defining some wire protocol for data exchange, another example is that file offsets are 64 bits for most file systems, etc.

the usize/isize type are meant to deal with memory access related operations (array indexing, poiner arithmetic, type punning with pointers, ffi interop, etc). if you use it like a number, you are probably mis-using it.

24 Likes

No. If you are 100%-certain that your code will only run on some 32-bit architecture, then usize is equivalent to u32, except that they are still different types. The equivalence is obvious if you look at the implementation of usize in the standard library: on 32-bit platforms, the "actual type" of usize is indeed u32.

However, a major goal of Rust (and other high-level programming languages) is to "write once, work everywhere". For example, it would be awkward to maintain a version of Rust standard library for 16-bit, 32-bit, and 64-bit platforms separately. Instead, we use isize and usize to abstract the pointer width difference and make our code portable.

5 Likes

usize/isize are designed to be able to address every byte of memory in your machine. Hence they change size depending on the width of the memory addresses on your machine. 32 bits on 32 bit machines, 64 bits on 64 bit machines. This is useful/essential for pointers and array indexes. You will not get mysterious overflows, wrappings, panics while working with valid memory addresses.

u32/u64 and others are for storing integer data of different sizes. Typically integer size is chosen to cover the number range in use and allow minimising memory usage if need be.

6 Likes

Code of the form foo[blah as usize] gets very annoying very quickly. The way I do stuff is: all serialization is u32 u64, but all in memory indexes are usize.

There is a benefit of using 64-bit indexing on some 64-bit architectures (including x86-64), because it matches CPU's native addressing modes. Indexing with 32-bits would need to handle integer overflow in 32-bits, and 64-bit addressing modes don't do that, so they can't be used without extra instructions.

OTOH there's no difference between using usize and integers of the same size on the same machine.

4 Likes

As a point, the usize size will be big enough to access the memory. That is the point of usize not being locked at a number of bits.

Only need 32 on a 32 bit system.

In short: usize/isize are for correctness, not performance - they are here so that we have one type for "just numbers" (the data itself) and another for "indices/offsets" (the addresses of this data).

3 Likes

In a perfect world, you would use u32, u64 etc every time you know the bound on the numbers, and usize if it's an index without a bound, i.e. "whatever fits in memory".

However, the standard library collections lack support for indexing with u32, and so we are stuck with the choice of either using usize for all indexes or having to jump through hoops (cast indexes to usize or implement a wrapper around standard collections).

My own experience is that well-organized code ends up having something like such wrappers — not as an explicit "U32IndexedCollection" but as something meaningful to the application — or in other ways does the indexing in a small set of functions, so that the few conversions that remain aren't a big bother. Of course, this might be more or less so for different applications.

I don't know if this is a "well-organized code:

 rg "[a-zA-Z]\[.*\]" | wc -l
 565                  

on a ~25K LOC codebase; so around 2% of the lines involve indexing.

Extraneous as are tedious. Except for FFI / serialization, I tend to use usize by default.

The main thing I mean to point at with “well-organized” is — as I work on the code and it becomes closer to being complete, well-documented, free of kludges, etc, I have found that I make changes that incidentally result in fewer separate indexing expressions, and so (if the index is of another type) fewer conversions needed. So, given that observation, I take frequently needed conversions as a clue that there might be some missing abstraction.

6 Likes

I use usize as C.uintptr_t.
usize makes sure its width is bigger/equal than u32.

No; usize may be 16 bits.

Of course, many programs cannot be usefully run on a 16-bit architecture, but if you are writing a Rust library that doesn't make particular assumptions about the platform (data structures, algorithms, etc.) then it's good to also avoid assuming that a u32 value will fit in usize.

(This is why there is impl From<u16> for usize but not impl From<u32> for usize.)

4 Likes

This question might be loosely related to a recent issue for the official Rust book:

While the authors think that the difference does not matter, I would really like to know if the byte size of usize and isize is determined by the CPU or OS. I hesitated to ask in this forum, as I wanted to generate not too much noise with my Rust beginner questions, but as a related topic is discussed just now, maybe someone can clarify my confusion.

[EDIT]

I just read all the explanations from above, and my feeling is that all of them makes sense and explains the topic well. Still, I wonder if a Rust book is available which explains topics like this one well and correct. Because, I think if I should continue using Rust, I should buy a good book. Choosing a good book is unfortunately not easy, there are so many Rusts books available, and the typical 5 stars Amazon rating is not really helpful.

1 Like

It's confusing...

Processors can be run in various "modes". The same 64 bit ARM processor could operate in a 32 bit or 64 bit mode. The same 64 bit Intel x86 can run in a 32 bit mode. Older 32 bit Intel processors could run in a 16 bit mode.

As a result if you buy a Raspberry Pi computer you can install a 32 bit or 64 bit operating system. Like wise you can get old 32 bit versions of Windows or Linux to run 64 bit Intel machines.

In these cases the width of usize is determined by the operating system in use regardless of what the CPU can actually do.

So yes, it does matter, if I say I have 64 bit ARM or Intel machine that does not mean I'm using a 64 bit OS, and hence does not say if usize is 32 or 64 bit for me.

On the other hand it does not matter. Until you actually run out of memory why would you care what the width of usize is? At that point you have a different problem.

4 Likes

Thanks for your detailed answer.

I just asked GPT-4 a few related questions -- I still have to study their reply carefully, but seems to be not that bad:

I have to disagree with the chat bot when it says about usize/isize:

  • Advantages: Automatically adjusts to the size that's most efficient for a given architecture.

The whole point of usize/isize is that they enable one to deal with the entire memory address space of your machine and know that will be so on all architectures you run on. When indexing arrays getting the size of objects crating raw pointers etc. They are not there to be the most efficient for your machine.

Indeed using usize/isize may not be the most efficient quite often.

Consider you doing some intensive work on large arrays of integers: Using smaller integers than your current usize/isize will mean you will have smaller arrays, using less memory, hence making better use of cache memory, hence potentially speeding up your work significantly.

Also if your code can be vectorised by the compiler, using SIMD, AVX, whatever instructions, you will be able to crush through more numbers faster if they are smaller than usize/isize.

The chat bot basically had a very long way of saying "it depends"...

3 Likes

Thank you very much for your corrections. Actually, I read statements saying the use of the CPU native types can be faster than use of smaller data types a few times in the last few decades on the Internet. For some CPU types that might be true. So the statement of GPT-4 is understandable, even if it is not fully correct. I was a bit disappointed by the fact that GPT-4 did not mentioned cache efficiency on its own. But well I has not asked explicitly for that topic. But of course for performance it can be important how much data values fit in the cache, so often 4 byte sized types provide better performance than 8 byte types, when large collections like Rust vectors have to be processed.

I'm glad to hear us humans can still make a contribution :slight_smile:

4 Likes