the first requires enabling #[no_std], and the second requires opting-in to the types. Both require rust nightly. I only want to enable these things when certain features are enabled.
I was wondering if there was a guide on how to do this?
I think you should be able to add no_std by doing something like #[cfg_attr(not(test), no_std] (I'm not sure if the cfg_attr is necessary -- testing might override no_std anyway).
Regarding i128/u128 support, I think adding a feature (128bit perhaps) would work and then place those implementations behind #[cfg(128bit)].
Looking at the implementation, I'm not sure you actually want impls for signed integers. As far as I can tell, the algorithm is only defined for positive integers. It seems reasonable that users who want to use this algorithm need to convert any potentially signed integers into unsigned ones before passing, and it would also avoid the Option that you have now. This seems like a generally better solution than having a potential panic condition.
For small enough numbers the fastest isqrt is just:
f64::from(x).sqrt() as uN
So I think you can test if x < K and use this simple floating point way, and use more complex code for numbers bigger than K (K is to be found), typically many u128 numbers will be larger than K.
I'd like a well implemented isqrt (and ilog) in the std library.
For a project that didn't need perfect precision, I made this, which is surprisingly close:
fn rough_isqrt(x: usize) -> usize {
if x == 0 { return x }
let n = std::mem::size_of::<usize>() as u32 * 8;
let scale = n - x.leading_zeros();
let half = scale/2;
let guess = ( (x >> half) + (1 << half) )/2;
guess
}
Given a binary number like 0b000001xxxxxxxxxx, think of it as 1.xxxxxxxxxx << 10. Using a first-order taylor series, √(1+e) ≈ 1 + e/2, or √f ≈ (f + 1)/2. So putting the shifts back to fix the magnitude gives, √1xxxxxxxxxx ≈ (1xxxxxxxxxx >> 5 + 1 << 5)/2.
Of course, the example above conveniently picked an even shift. For 0b0000001xxxxxxxxx, it's the same logic but with 0.1xxxxxxxxx << 10.
@cuviper Now I want to see how many newton steps this is from the correct answer
Edit: for numbers up to 100,000,000, it's known correct within 4 newton steps (demo).
If r is > sqrt(x), x/r will be < sqrt(x) and vice versa. So if r is a rough approximation of sqrt(x), then average(r, 1.0/r) will be a better approximation of sqrt(x).
I concur. On CPUs at least, it's better to just use double precision rather than throwing a conditional into the mix. Double precision costs about twice as much as simple precision, whereas a branch misprediction can get a lot more expensive than any arithmetic operation...
(On GPUs, it depends. Some of them are really shitty at double precision. Especially in the consumer segment, where some evil manufacturers go as far as disabling double-precision ALUs in firmware in order to prevent those nasty broke scientists from getting the compute performance of a $3000 Tesla card out of a $500 GeForce card...)