I tried to optimize the expression `a/b`

when `b`

is fixed (but unknown) since most of the cases, `a.max(b)<1<<21`

and `a>=0,b>0`

if we map `b`

to `b_mapped=((4398046511104_u64+b-1)/b`

, then `a/b == a*b_mapped>>42`

for each (a,b) satisfied `a.max(b)<1<<21`

and `a>=0,b>0`

to check whether the conversion is correct, I tried both the result and the computation time,

The results are definitely correct. but as for the computation time, strange things happened:

```
extern crate rayon;
use rayon::prelude::*;
fn main(){
let b=std::time::Instant::now();
let r1=(0u32..65536).into_par_iter().chunks(4096).map(|x|{
x.iter().map(|&i|{
(1u32..65536).into_iter().map(|j|{
i/j
}).sum::<u32>() as usize
}).sum::<usize>()
}).sum::<usize>();
println!("{:?}",b.elapsed());
let b=std::time::Instant::now();
let a:Vec<usize>=(0..1).chain((1..2097152).map(|x|((4398046511104_u64+x-1)/x as u64) as usize)).collect();
let r2=(0usize..65536).into_par_iter().chunks(4096).map(|x|{
x.iter().map(|&i|{
(1usize..65536).into_iter().map(|j|{
i*a[j]>>42
}).sum::<usize>()
}).sum::<usize>()
}).sum::<usize>();
println!("{:?}",b.elapsed());
let b=std::time::Instant::now();
let r3=(0u32..65536).into_par_iter().chunks(4096).map(|x|{
x.iter().map(|&i|{
(1u32..65536).into_iter().map(|j|{
(i as f64/j as f64) as u32
}).sum::<u32>() as usize
}).sum::<usize>()
}).sum::<usize>();
println!("{:?}",b.elapsed());
println!("{}",(r1+r2)>>1-r3);
}
```

that program returns

```
neutron@Neutron:/me/rust/rayonops$ cargo run --release
Compiling rayonops v0.1.0 (/me/rust/rayonops)
Finished release [optimized] target(s) in 0.86s
Running `target/release/rayonops`
1.739315795s
735.013066ms
957.63335ms
0
neutron@Neutron:/me/rust/rayonops$ cargo run --release
Finished release [optimized] target(s) in 0.01s
Running `target/release/rayonops`
1.77189577s
900.427822ms
1.013179198s
0
neutron@Neutron:/me/rust/rayonops$ cargo run --release
Finished release [optimized] target(s) in 0.01s
Running `target/release/rayonops`
1.778011042s
859.130591ms
1.003195896s
0
neutron@Neutron:/me/rust/rayonops$ cargo run --release
Finished release [optimized] target(s) in 0.01s
Running `target/release/rayonops`
1.53730607s
767.199922ms
1.048112309s
0
```

There is no doubt that `a*b_mapped>>42`

is faster than `a/b`

but it is quite strange that `a/b`

is slower than `a as f64/b as f64`

what happened in the latter case?

Does f64/f64 really faster than u32/u32?

if not, why does u32/u32 so slow in Rust?

if so, why Rust does not convert u32/u32 to f64/f64 implicitly?

Are there any cases that u32/nonzero is not equals to f64/f64 as u32?

PS: tests without rayon

```
neutron@Neutron:/me/rust/rayonops$ ./main_speed_test
10.20379502s
3.601069229s
6.565324578s
0
```

```
//without rayon
fn main(){
let b=std::time::Instant::now();
let r1=(0u32..65536).into_iter().map(|i|{
(1u32..65536).into_iter().map(|j|{
i/j
}).sum::<u32>() as usize
}).sum::<usize>();
println!("{:?}",b.elapsed());
let b=std::time::Instant::now();
let a:Vec<usize>=(0..1).chain((1..2097152).map(|x|((4398046511104_u64+x-1)/x as u64) as usize)).collect();
let r2=(0usize..65536).into_iter().map(|i|{
(1usize..65536).into_iter().map(|j|{
i*a[j]>>42
}).sum::<usize>()
}).sum::<usize>();
println!("{:?}",b.elapsed());
let b=std::time::Instant::now();
let r3=(0u32..65536).into_iter().map(|i|{
(1u32..65536).into_iter().map(|j|{
(i as f64/j as f64) as u32
}).sum::<u32>() as usize
}).sum::<usize>();
println!("{:?}",b.elapsed());
println!("{}",(r1+r2)>>1-r3);
}
```