Equivalent of __builtin_ia32_addcarryx_u64 in Rust (addition with carry)

trenildo · November 18, 2021, 7:55pm

There's no much information about __builtin_ia32_addcarryx_u64, only I coud find is this: clang: lib/Headers/adxintrin.h Source File

It looks like __builtin_ia32_addcarryx_u64 returns the carry of the addition of 2 u64 numbers as a u8. While I could implement this in Rust, would be nice to have the fast version of it.

cuviper · November 18, 2021, 8:05pm

It's available here:

trenildo · November 18, 2021, 8:32pm

is there something equivalent for aarch64? I tried looking into core::arch::aarch64 - Rust and found nothing about carry

cole-miller · November 18, 2021, 8:42pm

u64::overflowing_add does the computation you want and is cross-platform; it seems to be lowered to an LLVM intrinsic (?).

cuviper · November 18, 2021, 8:53pm

I don't know aarch64 that well. In the long run, you'll probably want the portable carrying_add instead, but that hasn't been stabilized yet.

trenildo · November 18, 2021, 8:59pm

it does not let you pass a carry before the addition happens, though, right?

cole-miller · November 18, 2021, 9:06pm

Indeed not, I didn't realize you needed the carry in as well.

TomP · November 18, 2021, 9:16pm

Some ISAs support this (e.g., X-86 & semi-clones), some do not (e.g., RISC-V). The underlying mathematical operation was trivial to realize in the sequential unpipelined ALUs of the early 1960s where it was used to realize multi-precision add/subtract on implementations with very narrow 4-bit, 8-bit, etc adders.

However, materializing this operation in modern super-scalar ALU implementations has a signficant gate/energy/delay/die-size cost.

Addendum: It's not the lowest-bit carry-in circuit that is the problem; it's that the instruction presumes that there is a Carry flag that recorded the carry-out result of a prior instruction, which is then used as the carry-in to the following Add-with-carry instruction. The logic/delay/pipeline-interlock/etc logic that is required so that the output of the first instruction can be available immediately as input to the following instruction quite complicates the implementation. That's why RISC-V has no such flags, which keeps the minimum number of gate delays required to realize an Add instruction quite small.

scottmcm · November 18, 2021, 9:44pm

This will definitely be the right answer eventually. That'll (eventually) give iadd_carry in cranelift, for example, and similarly the appropriate chain of things in LLVM or whatever.

You can copy its implementation for now, or just do the easy thing of doing the work in a wider type.

system · February 16, 2022, 9:45pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Adding signed and unsigned integers is verbose and error prone help	5	3791	January 12, 2023
Why isn’t there an `impl Add<i64> for u64` when `checked_add_signed` exists?	9	562	July 20, 2023
Rust specializing math operations per type help	2	353	February 22, 2022
Numbers in Rust	17	809	April 16, 2023
Bigints, U2048 and Number Theory	12	1175	October 14, 2021

Equivalent of __builtin_ia32_addcarryx_u64 in Rust (addition with carry)

Related Topics