Would using u128 be more efficient than using byte array of the same size in Rust if we only compare for equality?

I'm working with a bioinformatics application that handles many IDs. These IDs are well-defined and always shorter than 16 characters, so they can easily fit into a u128. Currently, I'm using String to represent these IDs, but I'm getting tired of constantly cloning them or carefully following Rust's borrowing rules. It's becoming appealing to use a data type that implements Copy, like integers, which would simplify things significantly.

An array such as [u8; 16] also implements Copy, making the code more flexible. If I suddenly encounter an unusually long ID that doesn't fit, I could simply reserve a larger array. But would using u128 be significantly more efficient than a byte array?

I'd like to ask someone with more profound knowledge of how Rust handles these data structures internally. I'm considering creating a new short_id crate specifically for managing short, string-like IDs, complete with convenient methods for conversion to and from various representations.

Short version is currently technically yes, passing u128 can be done via registers, so they can be done more efficiently for functions that don’t use too many arguments.

See this compiler explorer example where you can see the second example must fetch into registers first, and the significantly higher latency that has than anything else.

In practice though, you’re unlikely to see any actual difference, as:

  • you will generally either run out of registers very quickly when each id takes 2 64 bit registers, since you generally only get 6 registers for parameters before they spill to the stack.
  • you’re very likely to be passing ids from values in cache (likely on the stack even) anyway
  • you have to pay the cost of loading the id into the register in the first place anyway

In order to avoid exposing this detail, though, I would recommend using a newtype like:

#[repr(transparent)]
#[derive(Copy, Clone, PartialEq, Eq)]
pub struct Id(u128);

where the repr(transparent) ensures the value can be passed in registers (in general, it ensures you can safely transmute between references to the newtype and the contained value)

8 Likes

I wonder is such crate actually exist and if it even makes sense. Note that in addition to u128 there are also __m128. And depending on if your IDs are often have difference in first 8 bytes or not u128 may be faster or slower than __m128!

For “general use” u128 is probably fast enough, trying to see whether SSE or general purpose register approach is faster is overkill for the 99.99% of apps.

One potential advantage of using u128 is that it likely[1] has a higher alignment than [u8; 16]. This may make it more efficient to load, store, and compare than a [u8; 16] which has to be assumed to be possibly spread across two cache lines or pages.

However, increased alignment comes with its own disadvantage: A struct containing u128 and another field of lower size (e.g. a bool) will occupy more memory than it would if it contained [u8; 16]. Even disregarding total memory usage, this can reduce performance because it means more data transfer between the CPU and RAM is required. So, it’s worth trying, but you do need to try it and measure the performance of the change.


Also, note that you can get the same alignment benefits by defining your own aligned type rather than using u128:

#[repr(align(16))]
#[derive(Copy, ...)]
pub struct Id([u8; 16]);

The compiler is fully capable of, based on this definition, figuring out that it can use operations wider than per-byte to do things like comparing the value. The only thing this doesn’t do is ask for the function call ABI that a u128 might have, which might be more efficient (but this only matters for non-inlined functions).


  1. The alignment of primitive integer types is determined by the platform, so technically, u128 can have any of the possible alignments for its size: 1, 2, 4, 8, or 16. ↩︎

6 Likes

if we only compare for equality?

As of Stop generating `alloca`s & `memcmp` for simple short array equality by scottmcm · Pull Request #85828 · rust-lang/rust · GitHub we actually emit an i128 comparison to LLVM when you compare a [u8; 16] :slightly_smiling_face:

Demo: https://rust.godbolt.org/z/1aG5rYcv4

As you can see there, the only difference is in the alignment, as people have already mentioned.

13 Likes

I would benchmark using ArrayString and hope LLVM will efficiently pass that via registers (godbolt may help).

1 Like

Just out of curiosity, why was i128 chosen over u128?

2 Likes

LLVM does not distinguish signedness in its types -- they only have i128, no u128.

(It deals in signedness in the operations instead, such as sdiv vs udiv.)

6 Likes

May I ask what many id's is? I honestly would think there won't be a. measureable difference.

Coming from String to either of these types you
a) Reduce the memory footprint by 75% at least. Important for Caching.
b) Move from heap to stack memory, which brings allocation time to zero.
This is especially true when you need to copy/move your id's.

I believe this will reduce access time to the id's by a factor of 10 at least.

I am working on a mathematical app which uses u128 not as a number, but as data storage like you with a lot of bitshift operations.

Knowing Rust you will be able to work with likely more then hundred million per second and thread, so the surrounding code will become the bottleneck.

I would go with the option that fits your data needs better and is easier to handle. Which sounds like the byte array which you can easily expand if your id's grow.

I also recommend to blackbox your storage/ids in a struct, so that you can replace the internal data repesentation without changing the code using it.

You mentioned that ids could sometimes be wider than 16 bytes. If that’s the case then I’d recommend using something along the lines of smol_str as it will stack-allocate anything less than 24 bytes. If the id is larger then it is automatically heap allocated. I commonly use smol_str for ids.

2 Likes

It is not a performance question. I want something that would be easy to copy, ergonomics.

You can get a hacky Copy arbitrary value with Box::leak(), but the "proper" way to do this is an "atom" or "interning" approach: at it's simplest this is just:

struct Atoms {
    existing: HashMap<String, u32>,
    next: u32,
}

With this you do need to be careful that you only ever use one "domain" of atoms at once and you can always access it when needed, which can get "interesting" if you're trying to do it in a serde deserializer for example (you can probably use a thread local as the easiest option?), and of course you lose any ability to tell what the value actually is when debugging.

This would work, but

  • With constant token stream, this would result a memory leak.
  • If we restart the application, same strings may get different keys. This is a problem if the values are persisted somewhere outside the program.

Yep, you also need to consider those details. The definition of Copy means that you can't do anything when you drop it, so you need to handle any cleanup you want externally no matter what.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.