Thanks to all !

The main objective is to use automatic vectorization. I suspect that the sqrt couldn't be improved, but perhaps the loops yes. This is my main function:

```
#[inline]
fn run(galaxy: &mut Galaxy) {
let bodies = &mut (galaxy.bodies);
let poses = &mut (galaxy.poses);
let masses = &(galaxy.masses);
(0..D).for_each(|_| {
let forces = poses.par_iter().zip(&masses[..]).map(|(pj, massj)| {
let force =
zip(&poses[..], &masses[..]).fold(Force::new(), |force_acc, (pi, massi)| {
let dx = pi.x - pj.x;
let dy = pi.y - pj.y;
let dz = pi.z - pj.z;
let dsquared = (dx * dx) + (dy * dy) + (dz * dz) + SOFT;
let d32 = 1. / dsquared.get().sqrt().powi(3);
let f = G * *massj * *massi;
Force::new_with(
force_acc.x + f * dx * d32,
force_acc.y + f * dy * d32,
force_acc.z + f * dz * d32,
)
});
Force::new_with(
(force.x / *massj) * 0.5 * DT,
(force.y / *massj) * 0.5 * DT,
(force.z / *massj) * 0.5 * DT,
)
});
bodies.par_iter_mut().zip(forces).for_each(|(body, force)| {
let (dvx, dvy, dvz) = body.add_force(&force);
body.xvi = dvx;
body.yvi = dvy;
body.zvi = dvz;
body.dpx = dvx * DT;
body.dpy = dvy * DT;
body.dpz = dvz * DT;
});
poses
.par_iter_mut()
.zip(&bodies[..])
.for_each(|(pos, body)| pos.add_body(body));
});
}
```

I'm using rayon for parallel loops and fast-floats for approximations.

If I use f64 with N = 65536, It takes ~16 seconds. If I comment the .sqrt and doing only

`let d32 = 1. / dsquared.get().powi(3);`

it takes ~8 seconds. But PROBABLY, if I change some loops will be useful ? Or not, I don't know.

If you need the full code I could send it.

Again, thank you so much. I'm learning every day about Rust and I have doubts as to whether what I am doing is the best way.