I'm working with a bioinformatics application that handles many IDs. These IDs are well-defined and always shorter than 16 characters, so they can easily fit into a u128. Currently, I'm using String to represent these IDs, but I'm getting tired of constantly cloning them or carefully following Rust's borrowing rules. It's becoming appealing to use a data type that implements Copy, like integers, which would simplify things significantly.
An array such as [u8; 16] also implements Copy, making the code more flexible. If I suddenly encounter an unusually long ID that doesn't fit, I could simply reserve a larger array. But would using u128 be significantly more efficient than a byte array?
I'd like to ask someone with more profound knowledge of how Rust handles these data structures internally. I'm considering creating a new short_id crate specifically for managing short, string-like IDs, complete with convenient methods for conversion to and from various representations.
Short version is currently technically yes, passing u128 can be done via registers, so they can be done more efficiently for functions that don’t use too many arguments.
See this compiler explorer example where you can see the second example must fetch into registers first, and the significantly higher latency that has than anything else.
In practice though, you’re unlikely to see any actual difference, as:
you will generally either run out of registers very quickly when each id takes 2 64 bit registers, since you generally only get 6 registers for parameters before they spill to the stack.
you’re very likely to be passing ids from values in cache (likely on the stack even) anyway
you have to pay the cost of loading the id into the register in the first place anyway
In order to avoid exposing this detail, though, I would recommend using a newtype like:
where the repr(transparent) ensures the value can be passed in registers (in general, it ensures you can safely transmute between references to the newtype and the contained value)
I wonder is such crate actually exist and if it even makes sense. Note that in addition to u128 there are also __m128. And depending on if your IDs are often have difference in first 8 bytes or not u128 may be faster or slower than __m128!
For “general use” u128 is probably fast enough, trying to see whether SSE or general purpose register approach is faster is overkill for the 99.99% of apps.
One potential advantage of using u128 is that it likely[1] has a higher alignment than [u8; 16]. This may make it more efficient to load, store, and compare than a [u8; 16] which has to be assumed to be possibly spread across two cache lines or pages.
However, increased alignment comes with its own disadvantage: A struct containing u128 and another field of lower size (e.g. a bool) will occupy more memory than it would if it contained [u8; 16]. Even disregarding total memory usage, this can reduce performance because it means more data transfer between the CPU and RAM is required. So, it’s worth trying, but you do need to try it and measure the performance of the change.
Also, note that you can get the same alignment benefits by defining your own aligned type rather than using u128:
The compiler is fully capable of, based on this definition, figuring out that it can use operations wider than per-byte to do things like comparing the value. The only thing this doesn’t do is ask for the function call ABI that a u128 might have, which might be more efficient (but this only matters for non-inlined functions).
May I ask what many id's is? I honestly would think there won't be a. measureable difference.
Coming from String to either of these types you
a) Reduce the memory footprint by 75% at least. Important for Caching.
b) Move from heap to stack memory, which brings allocation time to zero.
This is especially true when you need to copy/move your id's.
I believe this will reduce access time to the id's by a factor of 10 at least.
I am working on a mathematical app which uses u128 not as a number, but as data storage like you with a lot of bitshift operations.
Knowing Rust you will be able to work with likely more then hundred million per second and thread, so the surrounding code will become the bottleneck.
I would go with the option that fits your data needs better and is easier to handle. Which sounds like the byte array which you can easily expand if your id's grow.
I also recommend to blackbox your storage/ids in a struct, so that you can replace the internal data repesentation without changing the code using it.
You mentioned that ids could sometimes be wider than 16 bytes. If that’s the case then I’d recommend using something along the lines of smol_str as it will stack-allocate anything less than 24 bytes. If the id is larger then it is automatically heap allocated. I commonly use smol_str for ids.
You can get a hacky Copy arbitrary value with Box::leak(), but the "proper" way to do this is an "atom" or "interning" approach: at it's simplest this is just:
With this you do need to be careful that you only ever use one "domain" of atoms at once and you can always access it when needed, which can get "interesting" if you're trying to do it in a serde deserializer for example (you can probably use a thread local as the easiest option?), and of course you lose any ability to tell what the value actually is when debugging.
Yep, you also need to consider those details. The definition of Copy means that you can't do anything when you drop it, so you need to handle any cleanup you want externally no matter what.