Should I use small datatypes when I know my numbers are small?

If I have a bunch of integers — indices or counts or whatever — that I know will always fit into a u8, should I use a u8? Or should I prefer usize, or some larger size like u32?

It's really more of a gut feeling thing than a hard rule. Here's my heuristic:

  • If they're indices into a slice, you should probably use usize, unless the unnecessarily large size causes memory or performance issues. Numbers have to be sign-extended to index into memory anyway and if you have usizes already you don't have to do as many noisy conversions (i as usize) when you use them that way.
  • If they're not indices, but represent numbers without a hard limit that will in practice never be "big", you should probably just use u32 (or i32 for signed) unless the size causes memory or performance issues. On most modern architectures, 32-bit integers are usually faster than 64-bit integers, but it is not generally true that 16- and 8-bit integers are faster still; often they just use the same registers and instructions but ignore the higher bytes, so there's not necessarily any benefit to going smaller except the size itself. If integer size is a performance concern you should definitely be profiling your code so you'll know whether it matters or not.
  • If they are numbers with a hard, compile-time limit that is "small", feel free to use whatever type is big enough (but maybe give it a type alias so it's easy to change in the future).
    • Addendum: also if you want to impose a hard limit on the number in order to make the storage more efficient, that's fine too.
1 Like

A) The primary reason to go for small-size items is storage efficiency. That only matters when you have a lot of them, or when they pack with other data into a struct that is significantly more cache-efficient than when the larger-size items are used. Be aware that on some architectures access to sub-word-size fields, particularly writes, may be less efficient than when you use an item sized to the CPU data path.

B) If your data is used for indices into slices, your code will be cleaner if you size the index data by usize or isize. So unless A) applies, for indices choose usize or isize.

C) For other items, if you size your items to the CPU data-path width, it will have maximal performance and minimal code size. Other sizes may encode less compactly in the instruction stream, or may entail hardware slowdowns in some cases. In particular, size data used with atomic operations to the CPU data-path width.