Integer square root algorithm

This is sort of an announcement and a request for help

I've implemented the integer square root from wikipedia for rust integer primitives on crates.io. The algorithm could be better (it's just a direct translation for now). Also, there are 2 things I would like to add:

  1. no_std
  2. i128 and u128 support

the first requires enabling #[no_std], and the second requires opting-in to the types. Both require rust nightly. I only want to enable these things when certain features are enabled.

I was wondering if there was a guide on how to do this?

1 Like

So a couple things:

I think you should be able to add no_std by doing something like #[cfg_attr(not(test), no_std] (I'm not sure if the cfg_attr is necessary -- testing might override no_std anyway).

Regarding i128/u128 support, I think adding a feature (128bit perhaps) would work and then place those implementations behind #[cfg(128bit)].

Looking at the implementation, I'm not sure you actually want impls for signed integers. As far as I can tell, the algorithm is only defined for positive integers. It seems reasonable that users who want to use this algorithm need to convert any potentially signed integers into unsigned ones before passing, and it would also avoid the Option that you have now. This seems like a generally better solution than having a potential panic condition.

For small enough numbers the fastest isqrt is just:

f64::from(x).sqrt() as uN

So I think you can test if x < K and use this simple floating point way, and use more complex code for numbers bigger than K (K is to be found), typically many u128 numbers will be larger than K.

I'd like a well implemented isqrt (and ilog) in the std library.

1 Like

#[no_std] shouldn't require nightly, I don't think? This is a library, not a binary, right?

For a project that didn't need perfect precision, I made this, which is surprisingly close:

fn rough_isqrt(x: usize) -> usize {
    if x == 0 { return x }

    let n = std::mem::size_of::<usize>() as u32 * 8;
    let scale = n - x.leading_zeros();
    let half = scale/2;
    let guess = ( (x >> half) + (1 << half) )/2;
    guess
}

(The playground ASM has an unnecessary branch due to an LLVM bug, which could be worked around with ctlz_nonzero in std::intrinsics - Rust)

4 Likes

Is there a way to work out the value of K? Can it be proved that for n < K, the result is equal to the integer algorithm?

This crate is meant to give exact answers, rather than an approximation for use in CG or similar.

Your comment reminds me of inverse square root from quake :stuck_out_tongue:

It's a library.
Is the syntax for this written somewhere? Is it something like

#[not(feature(std))]
#[no_std]

to just apply it when a feature isn't present? Or can you just mark it #[no_std] and it will continue to work with projects that use std?

It will just work for projects using std, as #![no_std] just says I dont need std.

1 Like

Thanks!

Nice bit hack! Can you say a little about the mathematical foundation of this algorithm? So far, my mind translated it as the following:

  • Find a power of two close in magnitude to x (scale such that 2^scale close to x)
  • Compute the approximate square root of that power of two, let's call it r = 2^(scale/2)
  • Compute the average of r and x/r.

But I don't understand why that leads to a good approximation of the square root of x (one which is much better than, say, r alone)

I think if your n is represented in a float without losing least-significant bits, you're fine -- so this would be something like K = 2mantissa-bits.

The third step is basically one iteration of Newton's method, which converges quickly.

1 Like

Given a binary number like 0b000001xxxxxxxxxx, think of it as 1.xxxxxxxxxx << 10. Using a first-order taylor series, √(1+e) ≈ 1 + e/2, or √f ≈ (f + 1)/2. So putting the shifts back to fix the magnitude gives, √1xxxxxxxxxx ≈ (1xxxxxxxxxx >> 5 + 1 << 5)/2.

Of course, the example above conveniently picked an even shift. For 0b0000001xxxxxxxxx, it's the same logic but with 0.1xxxxxxxxx << 10.

@cuviper Now I want to see how many newton steps this is from the correct answer :slight_smile:

Edit: for numbers up to 100,000,000, it's known correct within 4 newton steps (demo).

1 Like

If r is > sqrt(x), x/r will be < sqrt(x) and vice versa. So if r is a rough approximation of sqrt(x), then average(r, 1.0/r) will be a better approximation of sqrt(x).

2 Likes

Thank you all for the extra insight!

Well, the nice thing about 32-bit numbers is that you can just test all of them :slight_smile:

Empirically, the following gives the correct answer for everything:

fn isqrt_32(x: u32) -> u32 {
    let r = (x as f32).sqrt() as u32;
    if r <= 4096 { return r }
    (x/r + r)/2
}
fn isqrt_32(x: u32) -> u32 {
    let r = (x as f32).sqrt() as u32;
    if r <= 4096 { return r }
    (x/r + r)/2
}

Try with:

let r = (x as f64).sqrt() as u32;
1 Like

I concur. On CPUs at least, it's better to just use double precision rather than throwing a conditional into the mix. Double precision costs about twice as much as simple precision, whereas a branch misprediction can get a lot more expensive than any arithmetic operation...

(On GPUs, it depends. Some of them are really shitty at double precision. Especially in the consumer segment, where some evil manufacturers go as far as disabling double-precision ALUs in firmware in order to prevent those nasty broke scientists from getting the compute performance of a $3000 Tesla card out of a $500 GeForce card...)

I'd be less worried about the branch and more about the idiv in there (slow and not pipelined) :slight_smile:.

1 Like

There's some part of my brain that really likes simple problems (fast isqrt) with very complex optimal solutions.

1 Like

Hah, thanks, learned something today. I knew divisions were evil but didn't know that idiv in particular was that bad.

You may want to look at the discussion on the Ruby issues tracker on this issue.

The next version, Ruby 2.5, is adding an integer squareroot method based on this discussion.

Also check out my roots Rubygem, which perform both an integer squareroot and nth root methods.

https://github.com/jzakiya/roots
https://github.com/jzakiya/roots/blob/master/lib/roots.r

2 Likes