Why would someone use `usize` over `u32`?

feelingsonice · January 13, 2024, 5:55am

Why would someone use usize over u32 (or u64)? In other words, if I'm on a 32 bit architecture, is there a performance improvement for me to use usize instead of u32? Because if not, it feels like defining a fixed size variable like u32 (or u64) is objectively better than defining a "mystery-sized" variable? i.e. my code could be working fine on a 64 bit architecture, I switch over to a 32 bit machine and all the sudden I'm getting weird number wrappings, panics, or overflows. Is there such performance improvement?

nerditation · January 13, 2024, 6:24am

the types serves different purpose, its not that one is superior than the other. you should choose what type suits your use case.

again, they are supposed to be used for different purpose, if you mean if there's runtime overhead for casting between the two, no, the cast should be a no-op and optimized away in the generated machine code

the types with fixed size are used to store numeric values, you choose the correct size for the specific domain problem. for example, defining some wire protocol for data exchange, another example is that file offsets are 64 bits for most file systems, etc.

the usize/isize type are meant to deal with memory access related operations (array indexing, poiner arithmetic, type punning with pointers, ffi interop, etc). if you use it like a number, you are probably mis-using it.

IcicleF · January 13, 2024, 11:02am

No. If you are 100%-certain that your code will only run on some 32-bit architecture, then usize is equivalent to u32, except that they are still different types. The equivalence is obvious if you look at the implementation of usize in the standard library: on 32-bit platforms, the "actual type" of usize is indeed u32.

However, a major goal of Rust (and other high-level programming languages) is to "write once, work everywhere". For example, it would be awkward to maintain a version of Rust standard library for 16-bit, 32-bit, and 64-bit platforms separately. Instead, we use isize and usize to abstract the pointer width difference and make our code portable.

ZiCog · January 13, 2024, 11:26am

usize/isize are designed to be able to address every byte of memory in your machine. Hence they change size depending on the width of the memory addresses on your machine. 32 bits on 32 bit machines, 64 bits on 64 bit machines. This is useful/essential for pointers and array indexes. You will not get mysterious overflows, wrappings, panics while working with valid memory addresses.

u32/u64 and others are for storing integer data of different sizes. Typically integer size is chosen to cover the number range in use and allow minimising memory usage if need be.

anon80458984 · January 13, 2024, 11:49am

Code of the form foo[blah as usize] gets very annoying very quickly. The way I do stuff is: all serialization is u32 u64, but all in memory indexes are usize.

kornel · January 13, 2024, 12:22pm

There is a benefit of using 64-bit indexing on some 64-bit architectures (including x86-64), because it matches CPU's native addressing modes. Indexing with 32-bits would need to handle integer overflow in 32-bits, and 64-bit addressing modes don't do that, so they can't be used without extra instructions.

gist.github.com

https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759de5a7

gistfile1.txt

Why do compilers even bother with exploiting undefinedness signed overflow? And what are those
mysterious cases where it helps?

A lot of people (myself included) are against transforms that aggressively exploit undefined behavior, but
I think it's useful to know what compiler writers are accomplishing by this.

TL;DR: C doesn't work very well if int!=register width, but (for backwards compat) int is 32-bit on all
major 64-bit targets, and this causes quite hairy problems for code generation and optimization in some
fairly common cases. The signed overflow UB exploitation is an attempt to work around this.

This file has been truncated. show original

OTOH there's no difference between using usize and integers of the same size on the same machine.

stonerfish · January 13, 2024, 12:24pm

As a point, the usize size will be big enough to access the memory. That is the point of usize not being locked at a number of bits.

Only need 32 on a 32 bit system.

Cerber-Ursi · January 13, 2024, 1:46pm

In short: usize/isize are for correctness, not performance - they are here so that we have one type for "just numbers" (the data itself) and another for "indices/offsets" (the addresses of this data).

tczajka · January 13, 2024, 7:43pm

In a perfect world, you would use u32, u64 etc every time you know the bound on the numbers, and usize if it's an index without a bound, i.e. "whatever fits in memory".

However, the standard library collections lack support for indexing with u32, and so we are stuck with the choice of either using usize for all indexes or having to jump through hoops (cast indexes to usize or implement a wrapper around standard collections).

kpreid · January 13, 2024, 11:22pm

My own experience is that well-organized code ends up having something like such wrappers — not as an explicit "U32IndexedCollection" but as something meaningful to the application — or in other ways does the indexing in a small set of functions, so that the few conversions that remain aren't a big bother. Of course, this might be more or less so for different applications.

anon80458984 · January 13, 2024, 11:29pm

I don't know if this is a "well-organized code:

 rg "[a-zA-Z]\[.*\]" | wc -l
 565

on a ~25K LOC codebase; so around 2% of the lines involve indexing.

Extraneous as are tedious. Except for FFI / serialization, I tend to use usize by default.

kpreid · January 14, 2024, 1:39am

The main thing I mean to point at with “well-organized” is — as I work on the code and it becomes closer to being complete, well-documented, free of kludges, etc, I have found that I make changes that incidentally result in fewer separate indexing expressions, and so (if the index is of another type) fewer conversions needed. So, given that observation, I take frequently needed conversions as a clue that there might be some missing abstraction.

huiminghao · January 14, 2024, 4:28am

I use usize as C.uintptr_t.
usize makes sure its width is bigger/equal than u32.

kpreid · January 14, 2024, 4:48am

No; usize may be 16 bits.

Of course, many programs cannot be usefully run on a 16-bit architecture, but if you are writing a Rust library that doesn't make particular assumptions about the platform (data structures, algorithms, etc.) then it's good to also avoid assuming that a u32 value will fit in usize.

(This is why there is impl From<u16> for usize but not impl From<u32> for usize.)

StefanSalewski · January 14, 2024, 10:06am

This question might be loosely related to a recent issue for the official Rust book:

github.com/rust-lang/book

Chapter 3, is "architecture" referring to CPU or OS?

opened 10:52AM - 10 Dec 23 UTC

closed 03:08PM - 08 Jan 24 UTC

StefanSalewski

When reading that section some weeks ago, I marked it as it was not quite clear:… >Additionally, the isize and usize types depend on the architecture of the computer your program is running on, which is denoted in the table as “arch”: 64 bits if you’re on a 64-bit architecture and 32 bits if you’re on a 32-bit architecture. As I am still unsure, I just asked GPT-4: Is here, and generally, "architecture" referring to the hardware (CPU) or to the operating system? As we can run a 32-bit Windows on a 64-bit CPU. See, and please clarify in the book: https://chat.openai.com/share/3671d87f-458e-405b-a97f-87a91574cd95

While the authors think that the difference does not matter, I would really like to know if the byte size of usize and isize is determined by the CPU or OS. I hesitated to ask in this forum, as I wanted to generate not too much noise with my Rust beginner questions, but as a related topic is discussed just now, maybe someone can clarify my confusion.

[EDIT]

I just read all the explanations from above, and my feeling is that all of them makes sense and explains the topic well. Still, I wonder if a Rust book is available which explains topics like this one well and correct. Because, I think if I should continue using Rust, I should buy a good book. Choosing a good book is unfortunately not easy, there are so many Rusts books available, and the typical 5 stars Amazon rating is not really helpful.

ZiCog · January 14, 2024, 10:49am

It's confusing...

Processors can be run in various "modes". The same 64 bit ARM processor could operate in a 32 bit or 64 bit mode. The same 64 bit Intel x86 can run in a 32 bit mode. Older 32 bit Intel processors could run in a 16 bit mode.

As a result if you buy a Raspberry Pi computer you can install a 32 bit or 64 bit operating system. Like wise you can get old 32 bit versions of Windows or Linux to run 64 bit Intel machines.

In these cases the width of usize is determined by the operating system in use regardless of what the CPU can actually do.

So yes, it does matter, if I say I have 64 bit ARM or Intel machine that does not mean I'm using a 64 bit OS, and hence does not say if usize is 32 or 64 bit for me.

On the other hand it does not matter. Until you actually run out of memory why would you care what the width of usize is? At that point you have a different problem.

StefanSalewski · January 14, 2024, 11:03am

Thanks for your detailed answer.

I just asked GPT-4 a few related questions -- I still have to study their reply carefully, but seems to be not that bad:

ZiCog · January 14, 2024, 11:20am

I have to disagree with the chat bot when it says about usize/isize:

Advantages: Automatically adjusts to the size that's most efficient for a given architecture.

The whole point of usize/isize is that they enable one to deal with the entire memory address space of your machine and know that will be so on all architectures you run on. When indexing arrays getting the size of objects crating raw pointers etc. They are not there to be the most efficient for your machine.

Indeed using usize/isize may not be the most efficient quite often.

Consider you doing some intensive work on large arrays of integers: Using smaller integers than your current usize/isize will mean you will have smaller arrays, using less memory, hence making better use of cache memory, hence potentially speeding up your work significantly.

Also if your code can be vectorised by the compiler, using SIMD, AVX, whatever instructions, you will be able to crush through more numbers faster if they are smaller than usize/isize.

The chat bot basically had a very long way of saying "it depends"...

StefanSalewski · January 14, 2024, 11:35am

Thank you very much for your corrections. Actually, I read statements saying the use of the CPU native types can be faster than use of smaller data types a few times in the last few decades on the Internet. For some CPU types that might be true. So the statement of GPT-4 is understandable, even if it is not fully correct. I was a bit disappointed by the fact that GPT-4 did not mentioned cache efficiency on its own. But well I has not asked explicitly for that topic. But of course for performance it can be important how much data values fit in the cache, so often 4 byte sized types provide better performance than 8 byte types, when large collections like Rust vectors have to be processed.

ZiCog · January 14, 2024, 11:48am

I'm glad to hear us humans can still make a contribution

Topic		Replies	Views
Use of integer types in practice help	18	3555	April 18, 2021
Whether to use type isize or i32, fsize or f64, usize or u32 help	27	4086	March 24, 2023
Why is there no `impl From<u32> for usize`? help	31	5262	May 13, 2023
I32 vs isize, u32 vs usize	11	31279	July 3, 2022
"as" considered harmful?	80	7287	March 19, 2020

Why would someone use `usize` over `u32`?

Related topics