Suboptimal code generation for adding arbitrary-precision numbers


#1

I’m on a mission of sorts to scrub as much unsafe code as possible in my projects. Here is some code i wrote to add natural numbers, and here is the safetified version.

The function add_impl is meant to add x and y into z, if x is Some, and add y into z if x is None — essentially, if x is None, it’s as if z were given for both the sum and the first operand.

As you can see in the asm output, the former checks whether x is Some before the loop, but the latter checks in the loop body, which is not wanted! How can i persuade rustc/llvm to hoist the check out of the loop in the safe version? or should i file an issue somewhere? (or will i just need to use unsafe?)

Another note: the compiler is up to some shenanigans with the get_or function; see this vs that. I’m not sure why this would ever affect code generation, but it does…


#2

I’m not sure which check you mean exactly, but I’ve been able to eliminate one by getting rid of the slow 0..len pattern:

    for (i,z) in z.iter_mut().enumerate() {
        let (w, co) = usize::overflowing_add(xm.map_or(*z, |x| x.get_or(0, i)), y.get_or(0, i));
        *z = w + ci as usize;
        ci = co;
    }

You could try splitting it into two loops, first of which does .zip() on the common subset of two slices. If you do let tmp1 = &slice1[0..len]; let tmp2 = &slice2[0..len]; LLVM will believe you that both have required length and won’t check it again.


#3

The latter version does test rdx, rdx each loop iteration (which checks which variant the Option is); the former rather does cmove rcx, rsi; cmove rdx, rsi before the loop, which is like saying let x = xm.unwrap_or(z), but writing it in Rust would fail the borrow checker.

You mean thus? I see more branches; would you please explain what you mean? Anyhow, both such versions have more branches in the loop body than my original unsafe version. The version you posted does seem to have less register pressure tho — r15 is not pushed.


#4

Ah, sorry. I’ve misread the results.

Maybe someone else can have a look…


#5

@kornel ah well, thanks for looking!

I’m mostly wondering whether i ought to file an issue against rustc…


#6

It seems like the compiler does what you want with -C opt-level=3.

That said, it’s kind of odd to have your code rely on the optimizer turning a conditional inside a loop into two loops inside a conditional. I’m sure your real code is more complicated, but still, it might be better to restructure it to actually have two loops…


#7

Two loops? That’s not my desired output at this point — i see how it might potentially be a win but would need to benchmark to make sure. I just want the compiler to hoist the conditional out of the single loop.