Integer square root algorithm

derekdreery · October 23, 2017, 12:39pm

This is sort of an announcement and a request for help

I've implemented the integer square root from wikipedia for rust integer primitives on crates.io. The algorithm could be better (it's just a direct translation for now). Also, there are 2 things I would like to add:

no_std
i128 and u128 support

the first requires enabling #[no_std], and the second requires opting-in to the types. Both require rust nightly. I only want to enable these things when certain features are enabled.

I was wondering if there was a guide on how to do this?

Mark_Simulacrum · October 23, 2017, 12:59pm

So a couple things:

I think you should be able to add no_std by doing something like #[cfg_attr(not(test), no_std] (I'm not sure if the cfg_attr is necessary -- testing might override no_std anyway).

Regarding i128/u128 support, I think adding a feature (128bit perhaps) would work and then place those implementations behind #[cfg(128bit)].

Looking at the implementation, I'm not sure you actually want impls for signed integers. As far as I can tell, the algorithm is only defined for positive integers. It seems reasonable that users who want to use this algorithm need to convert any potentially signed integers into unsigned ones before passing, and it would also avoid the Option that you have now. This seems like a generally better solution than having a potential panic condition.

leonardo · October 23, 2017, 2:54pm

For small enough numbers the fastest isqrt is just:

f64::from(x).sqrt() as uN

So I think you can test if x < K and use this simple floating point way, and use more complex code for numbers bigger than K (K is to be found), typically many u128 numbers will be larger than K.

I'd like a well implemented isqrt (and ilog) in the std library.

steveklabnik · October 23, 2017, 6:11pm

#[no_std] shouldn't require nightly, I don't think? This is a library, not a binary, right?

scottmcm · October 23, 2017, 6:41pm

For a project that didn't need perfect precision, I made this, which is surprisingly close:

fn rough_isqrt(x: usize) -> usize {
    if x == 0 { return x }

    let n = std::mem::size_of::<usize>() as u32 * 8;
    let scale = n - x.leading_zeros();
    let half = scale/2;
    let guess = ( (x >> half) + (1 << half) )/2;
    guess
}

(The playground ASM has an unnecessary branch due to an LLVM bug, which could be worked around with ctlz_nonzero in std::intrinsics - Rust)

derekdreery · October 24, 2017, 5:16pm

Is there a way to work out the value of K? Can it be proved that for n < K, the result is equal to the integer algorithm?

This crate is meant to give exact answers, rather than an approximation for use in CG or similar.

Your comment reminds me of inverse square root from quake

It's a library.
Is the syntax for this written somewhere? Is it something like

#[not(feature(std))]
#[no_std]

to just apply it when a feature isn't present? Or can you just mark it #[no_std] and it will continue to work with projects that use std?

bjorn3 · October 24, 2017, 5:25pm

It will just work for projects using std, as #![no_std] just says I dont need std.

derekdreery · October 24, 2017, 5:26pm

Thanks!

HadrienG · October 24, 2017, 5:45pm

Nice bit hack! Can you say a little about the mathematical foundation of this algorithm? So far, my mind translated it as the following:

Find a power of two close in magnitude to x (scale such that 2^scale close to x)
Compute the approximate square root of that power of two, let's call it r = 2^(scale/2)
Compute the average of r and x/r.

But I don't understand why that leads to a good approximation of the square root of x (one which is much better than, say, r alone)

cuviper · October 24, 2017, 6:02pm

I think if your n is represented in a float without losing least-significant bits, you're fine -- so this would be something like K = 2^{mantissa-bits}.

The third step is basically one iteration of Newton's method, which converges quickly.

scottmcm · October 24, 2017, 7:59pm

Given a binary number like 0b000001xxxxxxxxxx, think of it as 1.xxxxxxxxxx << 10. Using a first-order taylor series, √(1+e) ≈ 1 + e/2, or √f ≈ (f + 1)/2. So putting the shifts back to fix the magnitude gives, √1xxxxxxxxxx ≈ (1xxxxxxxxxx >> 5 + 1 << 5)/2.

Of course, the example above conveniently picked an even shift. For 0b0000001xxxxxxxxx, it's the same logic but with 0.1xxxxxxxxx << 10.

@cuviper Now I want to see how many newton steps this is from the correct answer

Edit: for numbers up to 100,000,000, it's known correct within 4 newton steps (demo).

kaj · October 24, 2017, 8:52pm

If r is > sqrt(x), x/r will be < sqrt(x) and vice versa. So if r is a rough approximation of sqrt(x), then average(r, 1.0/r) will be a better approximation of sqrt(x).

HadrienG · October 25, 2017, 5:12am

Thank you all for the extra insight!

scottmcm · October 26, 2017, 10:32am

Well, the nice thing about 32-bit numbers is that you can just test all of them

Empirically, the following gives the correct answer for everything:

fn isqrt_32(x: u32) -> u32 {
    let r = (x as f32).sqrt() as u32;
    if r <= 4096 { return r }
    (x/r + r)/2
}

leonardo · October 26, 2017, 11:50am

fn isqrt_32(x: u32) -> u32 {
    let r = (x as f32).sqrt() as u32;
    if r <= 4096 { return r }
    (x/r + r)/2
}

Try with:

let r = (x as f64).sqrt() as u32;

HadrienG · October 26, 2017, 12:40pm

I concur. On CPUs at least, it's better to just use double precision rather than throwing a conditional into the mix. Double precision costs about twice as much as simple precision, whereas a branch misprediction can get a lot more expensive than any arithmetic operation...

(On GPUs, it depends. Some of them are really shitty at double precision. Especially in the consumer segment, where some evil manufacturers go as far as disabling double-precision ALUs in firmware in order to prevent those nasty broke scientists from getting the compute performance of a $3000 Tesla card out of a $500 GeForce card...)

vitalyd · October 26, 2017, 12:54pm

I'd be less worried about the branch and more about the idiv in there (slow and not pipelined) .

derekdreery · October 26, 2017, 3:36pm

There's some part of my brain that really likes simple problems (fast isqrt) with very complex optimal solutions.

HadrienG · October 26, 2017, 4:19pm

Hah, thanks, learned something today. I knew divisions were evil but didn't know that idiv in particular was that bad.

jzakiya · October 27, 2017, 5:54am

You may want to look at the discussion on the Ruby issues tracker on this issue.

The next version, Ruby 2.5, is adding an integer squareroot method based on this discussion.

Also check out my roots Rubygem, which perform both an integer squareroot and nth root methods.

https://github.com/jzakiya/roots
https://github.com/jzakiya/roots/blob/master/lib/roots.r

Topic		Replies	Views
Need Integer:sqrt in std lib	5	882	January 4, 2020
U32/integer square root help	10	3628	January 5, 2023
Finding the square root in embedded with #![no_std] embedded	7	2479	July 2, 2020
Integer square root	10	36074	February 2, 2015
Square root functionality in rust help	2	7068	June 20, 2019

Integer square root algorithm

Related topics