Fastest way to do this equation

Hi ! I have to do this equation:

1/ number.powf(1.5)

So, the faster way to to this was:

1/number.sqrt().powi(3)

But I don't know if I could improve the performance of this, considering the autovectorization.

Thank you !

Any ideas ? Thank you !

What about number.powf(-1.5)?

Thanks for answer,

No, much worse than the sqrt powi:

1/number.sqrt().powi(3)

I don't know why but powf is extremely slow.

That is to be expected according to the documentation:

Modern processors also have instructions for sqrt, which is why that part is very fast.

I have written such libraries. Raising to an integer power just uses multiplications, successive squaring if the exponent is large. Raising to a floating point power is harder, e.g. page 6 of Floating Point Math Functions. Any math library worth its salt will check to see if the floating point exponent is an integer, but your performance comment makes it sound like Rust's doesn't.

It's not clear that sqrt and divide instructions are done in hardware, or even if that's a good idea. It's impractical to build hardware for a pipelinable divide/sqrt algorithm that always rounds correctly, or at least it was when I was doing such work many years ago. It's likely that the instructions are micocoded so you can take advantage of SIMD instructions.

1 Like

How accurate do you need the result to be? There are approximations for sqrt, that are going to be much faster, if performance matters more than accuracy, at least to a degree.

Might be a use case for Fast inverse square root.

Accurate divide and sqrt are quite fast. Floating point arithmetic is tricky enough without having to worry about round off errors due to inaccurate division.

The preferred approach is to use Newton's method with an 8-bit first guess. Two iterations for divide take 4 multiply/add instructions to get a correctly rounded 32-bit result. One more, iteration gets you 64 bits. And the calculation can use SIMD.

Intrinsic functions (sin, cos, etc.) are another matter. Low order Chebyshev approximations can give you a big performance boost. I got a large speed-up in my thesis program by using 5-digit approximations to log and exponential of 4-digit data.

The real wins come from understanding your problem. Games often get a lot of speedup from calculating 1/sqrt directly. They can also get away with lower precision because they know the result won't be used in subsequent calculations. (@Finn beat me to the punch on this one.)

3 Likes

powi can take a negative exponent, so another option is number.sqrt().powi(-3). I'd guess it will perform similarly to 1/... or recip() anyway, but you should profile to see what needs optimizing.

Thank you to all of us for reply. It was very helpful.

Here is the complete Rust code. It's my faster version. I'm using this library:

                let dx = pi.x - pj.x;
                let dy = pi.y - pj.y;
                let dz = pi.z - pj.z;
                let dsquared = (dx * dx) + (dy * dy) + (dz * dz) + 1e-20;
                let d32 = 1. / dsquared.get().sqrt().powi(3); 

So, the last line is the one that takes the longest time. One cause could be that the fast-floats library doesn't have a "fast" sqrt method. Do you have any ideas ? Thanks again !