Trapezoidal integration in parallel

I need to perform a large number of trapezoidal integration on large data.
The following code is working for sequential runs. How can parallelize the integration?

fn trapInt(Xdata: Vec<f64>, Ydata: Vec<f64>) -> f64 {   
    (1..(Xdata.len() - 1))
        .map(|i| (Xdata[i + 1] - Xdata[i]) * (Ydata[i + 1] + Ydata[i]) / 2.0)
        .sum()
}

First, I think you probably want the range 0..Xdata.len() - 1, not 1..Xdata.len() - 1, since otherwise the values Xdata[0], Ydata[0] are never used. Also, there's no need for your function to take ownership of the vectors, so you should take Xdata, Ydata: &[f64] instead.

With those changes made, a simple translation using rayon's parallel iterators would be

use rayon::prelude::*;

fn par_trap_v0(x: &[f64], y: &[f64]) -> f64 {
    // assuming `x.len() == y.len()`
    (0..x.len() - 1)
        .into_par_iter()
        .map(|i| (x[i + 1] - x[i]) * (y[i + 1] - y[i]) / 2.0)
        .sum()
}

A slightly fancier translation, which might be able to avoid some unnecessary bounds checks, would be

use rayon::prelude::*;

fn par_trap_v1(x: &[f64], y: &[f64]) -> f64 {
    x.par_windows(2)
        .zip(y.par_windows(2))
        .map(|pair| {
            match pair {
                (&[x0, x1], &[y0, y1]) => (x1 - x0) * (y1 - y0) / 2.0,
                _ => unreachable!(),
            }
        })
        .sum()
}

Disclaimer, I have no idea whether you can get any actual speedup from either of these :‌)

1 Like

One more thought, I'd expect that SIMD techniques could be fruitfully applied here. There are a bunch of portable SIMD crates and I'm not sure how they compare to each other, but that could be something to look into if you need more speedup than rayon's task-based parallelism gives you.

1 Like

Actually, you might not even need a dedicated crate. LLVM is smart enough to emit SSE instructions for the following code, when rustc is invoked with -C opt-level=3 -C target-cpu=native:

fn windows<T, const N: usize>(slice: &[T]) -> impl Iterator<Item = &[T; N]> {
    slice.windows(N).map(|x| x.try_into().unwrap())
}

pub fn par_trap_v2(x: &[f64], y: &[f64]) -> f64 {
    assert_eq!(x.len(), y.len());
    windows(x).zip(windows(y))
        .map(|([x0, x1], [y0, y1])| {
            (x1 - x0) * (y1 - y0) / 2.0
        })
        .sum()
}
3 Likes