Equivalent of __builtin_ia32_addcarryx_u64 in Rust (addition with carry)

There's no much information about __builtin_ia32_addcarryx_u64, only I coud find is this: clang: lib/Headers/adxintrin.h Source File

It looks like __builtin_ia32_addcarryx_u64 returns the carry of the addition of 2 u64 numbers as a u8. While I could implement this in Rust, would be nice to have the fast version of it.

It's available here:

1 Like

is there something equivalent for aarch64? I tried looking into core::arch::aarch64 - Rust and found nothing about carry

u64::overflowing_add does the computation you want and is cross-platform; it seems to be lowered to an LLVM intrinsic (?).

I don't know aarch64 that well. In the long run, you'll probably want the portable carrying_add instead, but that hasn't been stabilized yet.

3 Likes

it does not let you pass a carry before the addition happens, though, right?

Indeed not, I didn't realize you needed the carry in as well.

Some ISAs support this (e.g., X-86 & semi-clones), some do not (e.g., RISC-V). The underlying mathematical operation was trivial to realize in the sequential unpipelined ALUs of the early 1960s where it was used to realize multi-precision add/subtract on implementations with very narrow 4-bit, 8-bit, etc adders.

However, materializing this operation in modern super-scalar ALU implementations has a signficant gate/energy/delay/die-size cost.

Addendum: It's not the lowest-bit carry-in circuit that is the problem; it's that the instruction presumes that there is a Carry flag that recorded the carry-out result of a prior instruction, which is then used as the carry-in to the following Add-with-carry instruction. The logic/delay/pipeline-interlock/etc logic that is required so that the output of the first instruction can be available immediately as input to the following instruction quite complicates the implementation. That's why RISC-V has no such flags, which keeps the minimum number of gate delays required to realize an Add instruction quite small.

This will definitely be the right answer eventually. That'll (eventually) give iadd_carry in cranelift, for example, and similarly the appropriate chain of things in LLVM or whatever.

You can copy its implementation for now, or just do the easy thing of doing the work in a wider type.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.