Accurate divide and sqrt are quite fast. Floating point arithmetic is tricky enough without having to worry about round off errors due to inaccurate division.

The preferred approach is to use Newton's method with an 8-bit first guess. Two iterations for divide take 4 multiply/add instructions to get a correctly rounded 32-bit result. One more, iteration gets you 64 bits. And the calculation can use SIMD.

Intrinsic functions (sin, cos, etc.) are another matter. Low order Chebyshev approximations can give you a big performance boost. I got a large speed-up in my thesis program by using 5-digit approximations to log and exponential of 4-digit data.

The real wins come from understanding your problem. Games often get a lot of speedup from calculating 1/sqrt directly. They can also get away with lower precision because they know the result won't be used in subsequent calculations. (@Finn beat me to the punch on this one.)