Nbabel nbody integrator

Hi folks, I'm trying to make an "idiomatic, fast and lightweight" implementation of the Nbabel nbody integrator based on this python version. I would appreciate very much any feedback on it:

/// The Computer Language Benchmarks Game
/// https://salsa.debian.org/benchmarksgame-team/benchmarksgame/
/// 
/// Contributed by Ilia Schelokov and modified by Darley Barreto
use std::fs;
use std::path::Path;
use std::ops::{Add, Sub, Mul, AddAssign, SubAssign};
use std::default::Default;

#[derive(Clone, Copy)]
struct Vec3D(f64, f64, f64);

impl Vec3D {
    fn sum_squares(&self) -> f64 {
        self.0 * self.0
            + self.1 * self.1
            + self.2 * self.2
    }
}

impl Default for Vec3D {
    fn default() -> Vec3D {
        Vec3D(0.0, 0.0, 0.0)
    }
}

impl Add for Vec3D {
    type Output = Vec3D;
    fn add(self, rhs: Self) -> Self::Output {
        Vec3D(
            self.0 + rhs.0,
            self.1 + rhs.1,
            self.2 + rhs.2
        )
    }
}

impl Sub for &Vec3D {
    type Output = Vec3D;
    fn sub(self, rhs: Self) -> Self::Output {
        Vec3D(
            self.0 - rhs.0,
            self.1 - rhs.1,
            self.2 - rhs.2
        )
    }
}

impl Mul<&Vec3D> for f64 {
    type Output = Vec3D;
    fn mul(self, rhs: &Vec3D) -> Self::Output {
        Vec3D(
            self * rhs.0,
            self * rhs.1,
            self * rhs.2
        )
    }
}

impl Mul<Vec3D> for f64 {
    type Output = Vec3D;
    fn mul(self, rhs: Vec3D) -> Self::Output {
        Vec3D(
            self * rhs.0,
            self * rhs.1,
            self * rhs.2
        )
    }
}

impl AddAssign for Vec3D {
    fn add_assign(&mut self, rhs: Self) {
        self.0 += rhs.0;
        self.1 += rhs.1;
        self.2 += rhs.2;
    }
}

impl SubAssign for Vec3D {
    fn sub_assign(&mut self, rhs: Self) {
        self.0 -= rhs.0;
        self.1 -= rhs.1;
        self.2 -= rhs.2;
    }
}

#[derive(Clone)]
struct Particle {
    position: Vec3D,
    velocity: Vec3D,
    acceleration: [Vec3D;2],
    mass: f64,
}

fn compute_energy(particles: &mut Vec<Particle>, pe: f64) -> f64{
    pe + particles.iter().map(|p| 0.5 * p.mass * p.velocity.sum_squares() ).sum::<f64>()
}

fn advance_velocities(particles: &mut Vec<Particle>, dt: f64) {
    for p in particles {
        p.velocity += 0.5 * dt * (p.acceleration[0] + 
                                  p.acceleration[1]);
    }
}

fn advance_positions(particles: &mut Vec<Particle>, dt: f64) {
    for p in particles {
        p.position += dt * p.velocity + 0.5 *
                      dt.powi(2) * p.acceleration[0];
    }
}

fn accelerate(particles: &mut Vec<Particle>) -> f64 {
    for p in particles.iter_mut() {
        p.acceleration[1] = p.acceleration[0];
        p.acceleration[0] = Default::default();
    }
    
    let mut pe = 0.0;

    for (i, p1) in unsafe { &mut *(particles as *mut Vec<Particle>) }.iter_mut().enumerate() {
        for p2 in &mut particles[i+1 ..] {
            let vector = &p1.position - &p2.position;
            let distance = vector.sum_squares().sqrt();
            let distance_cube = distance.powi(3);
            
            p1.acceleration[0] -= (p2.mass / distance_cube) * &vector;
            p2.acceleration[0] += (p1.mass / distance_cube) * &vector;

            pe -= (p1.mass * p2.mass) / distance;
        }
    }

    pe
}

fn parse_row(line: &str) -> Particle {
    let row_vec: Vec<f64> = line
                            .split(" ")
                            .filter(|s| s.len() > 2)
                            .map(|s| s.parse().unwrap())
                            .collect();

    Particle {
        position: Vec3D(row_vec[1], row_vec[2], row_vec[3]),
        velocity: Vec3D(row_vec[4], row_vec[5], row_vec[6]),
        acceleration: Default::default(),
        mass: row_vec[0],
    }
            
}

fn main() {
    let file_path= Path::new("data/input1k");
    let mut particles: Vec<Particle> = fs::read_to_string(file_path)
                                 .expect("File not found!")
                                 .lines()
                                 .filter(|x| x.len() > 0)
                                 .map(|x| parse_row(x))
                                 .collect();

    let mut pe;
    let mut energy = 0.;
    let (tend, dt) = (10.0, 0.001);  // end time, timestep
    let (mut old_energy, energy0) = (-0.25, -0.25);

    accelerate(&mut particles);

    for step in 1 .. (tend / dt + 1.0) as i64 {
        advance_positions(&mut particles, dt);
        pe = accelerate(&mut particles);
        advance_velocities(&mut particles,dt);

        if step % 100 == 0 {
            energy = compute_energy(&mut particles, pe);
            println!("t = {}, E = {},  dE/E = {}", dt * step as f64, energy, (energy - old_energy) / old_energy);
            old_energy = energy;
        }
    }

    println!("Final dE/E = {}", (energy - energy0) / energy0);
}

PS: If you want to run, I am testing on the file data/input1k inside of the same repo, I can't put the link here because of the limitation for new users.
PSS: This code is a modified version of the nbody Rust #8 of the benchmarks game.

  • The impl Default for Vec3D could be replaced with a #[derive(Default)] (or you could keep the impl for explicitness)
  • compute_energy should take &[Particle] instead of &mut Vec<Particle> since it doesn't modify that argument (in general you never want to pass a Vec<T> by reference unless your function needs to change the length of the vector; usually &[T] or &mut [T] are preferable)
  • Likewise advance_velocities, advance_positions, and accelerate should take &mut [Particle]
  • It seems like you should be able to avoid the use of unsafe in accelerate but I have to think about it a bit more

Here's a playground with those changes, main removed (since running it in the playground environment is pointless), and rustfmt applied.

1 Like

Avoiding the unsafe:

    // Before
    for (i, p1) in unsafe { &mut *(particles as *mut Vec<Particle>) }.iter_mut().enumerate() {
        for p2 in &mut particles[i+1 ..] {

    // After
    for i in 1..particles.len() {
        let (head, tail) = particles.split_at_mut(i);
        let p1 = &mut head[i - 1];
        for p2 in tail {
3 Likes

As for the "idiomatic" part, I'd suggest creating a struct Bodies that wraps Vec<Particle> and turning the free functions here into methods on that type. That's the typical way of grouping related data and functionality in Rust programs, especially when mutation is involved.

1 Like

Thank you @quinedot and @cole-miller so much for the tips!!

1 Like

Oh, one more thing: the doc comment construct /// ... attaches to the next item in the file, so the way you're using it here makes it apply to import std::fs which is probably unintended. I'd suggest either using either /* ... */ (normal block comment, doesn't appear in documentation) or //! ... (doc comment, but applies to the surrounding item, in this case the module containing your code) instead.

1 Like

The benchmark decreased a bit (from 500ms to 700ms) due to replacing the unsafe, right?

I don't know very much about optimization but I think you'd have to look at the generated assembly (e.g. using rust.godbolt.org) to determine for sure what causes the slowdown.

1 Like

If it is, bounds checking may be the reason. You could try some variations in the approach (like split_first_mut) and see if an assert! helps, etc.

1 Like

I gave a look into the asm, it might be these

call    qword ptr [rip + <alloc::vec::Vec<T> as core::ops::deref::DerefMut>::deref_mut@GOTPCREL] 
call    qword ptr [rip + core::slice::<impl [T]>::split_at_mut@GOTPCREL]

Which aren't called in the unsafe version. Also the unsafe call

call    qword ptr [rip + <alloc::vec::Vec<T> as core::ops::index::IndexMut<I>>::index_mut@GOTPCREL]

Just to make sure, you're looking at the generated code for the --release profile, right? (This is also what you should use for benchmarks.)

Warning: the unsafe code showed before triggers undefined behavior -- it is detected by miri using it from the playground.

2 Likes

I am using criterion.rs, so it's on release. That asm idk, I was using godbolt.

Oh wow, I didn't know that was possible to see! Thanks.

It's a pleasure to be helpful! :blush:

In general you want to use something like -C opt-level=3 -C target-cpu=haswell to get a good idea of optimization on a pretty decent cpu. Which generally equals to a mess of hardly understandable auto-vectorized code :sweat_smile:

2 Likes

I've just seen it, it's just a bunch of weird stuff, lol.

You can also do something like this to avoid unsafe:

let mut particles = particles;
while let Some((p1, rest)) = particles.split_first_mut() {
    particles = rest;
    
    for p2 in particles.iter_mut() {
        let vector = &p1.position - &p2.position;
        let distance = vector.sum_squares().sqrt();
        let distance_cube = distance.powi(3);

        p1.acceleration[0] -= (p2.mass / distance_cube) * &vector;
        p2.acceleration[0] += (p1.mass / distance_cube) * &vector;

        pe -= (p1.mass * p2.mass) / distance;
    }
}

Note: slice::IterMut uses split_first_mut as it's implementation of Iterator::next!

edit: I just ran my edit through godbolt, and accelerate is completely vectorized and there are no function calls inside it, so this should be as good as it gets.

edit2: here's the minimal godbolt link with my edit

4 Likes

@RustyYato thank you so much for putting time into this. It didn't seem to improve much on the final time :confused: .