Parallelise n-body code

I want to parallelise the iterators in the n-body code. I understand that in the n-body problem, the next state depends on the previous state so it is not possible to parallelise the entire code. But I think the different calculations in each state can be parallelised. Can somebody please help?
n-body code

I had the same problem for an N-body simulation. I only parallelized the O(n²) force, not the O(n) ones.

In a loop I divided the data into groups of chunks, then for each group sequentially, each chunk can be treated by a different thread and these threads can run concurrently without collision. It was quite tricky and I needed a bit of unsafe for the iterators. (it's worse if you need SIMD but the principle remains the same)

schema of the chunks here

see my code: defining and implementing the parallel iterators -- creating them -- using them

You could switch it to a Barnes-Hut simulation, but that might be overkill and/or not applicable to your case.

As for making standard n-body code run in parallel, it sounds like you're conflating two separate steps:

  • Computing the sum of forces/acceleration for all bodies (which doesn't need mutable access to body position/mass data, i.e. fixed state produces fixed forces) vs.
  • Actually stepping each body forward, which only depends on each body's current position/velocity and total incoming force/acceleration (but does require ownership/mutability)

Because each body interacts with every other body you need to calculate a pairwise force for every combination of two bodies. So for N bodies the total number of force pair combinations [nCr] can be calculated as C(N,2) = N!/[2!*(N-2)!] = n*(n-1)/2. This works out to be the same number of inner iterations as in the double iteration: for i in 0..N { for j in i+1..N { } }.

Thus, you could create a single Vec that holds pre-selected indices for all pairwise combinations:

let mut pairwise_body_idxs: Vec<(usize, usize)> = Vec::with_capacity(N*(N-1)/2);
for i in 0..N {
    for j in i+1..N {
        pairwise_body_idxs.push((i, j));
    }
}
let pairwise_body_idxs = pairwise_body_idxs;

That can be used to par_iter() through all of the force calculations. As long as you are indexing into immutable state it doesn't matter how many threads are reading from your 'universe'. So you could pass in a single &ref of your set of body positions/masses by partial application to produce the function you want to .map() over each index pair.

At that point you'll effectively have a parallel iterator whose items are (usize, usize, Force), which you'll need to reduce/fold down to just a Vec<Force>. An interesting constraint of .reduce() is that all elements need to be Self::Item, whereas .fold() has a separate type T for the identity/result and Self::Item to add. So, you could implement

impl Add<Rhs = (usize, usize, Force)> for Vec<Force> {
    type Output = Self;
    fn add(self, rhs: Rhs) -> Self::Output {
        self[rhs.0] += rhs.2;
        self[rhs.1] -= rhs.2;
    }
}

Then you should be able to .fold() down to a single result Vec<Force>. And then you're back to the O(n) timestep portion of things.

1 Like

Alternatively, the combinatorial number system is a formal method for enumerating combinations. So, if you don't have a fixed number of bodies, you can use some math to work out body index pairs from an appropriate-length indexed parallel iterator [playground].

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.