I need to perform a large number of trapezoidal integration on large data.
The following code is working for sequential runs. How can parallelize the integration?
First, I think you probably want the range 0..Xdata.len() - 1, not 1..Xdata.len() - 1, since otherwise the values Xdata[0], Ydata[0] are never used. Also, there's no need for your function to take ownership of the vectors, so you should take Xdata, Ydata: &[f64] instead.
With those changes made, a simple translation using rayon's parallel iterators would be
One more thought, I'd expect that SIMD techniques could be fruitfully applied here. There are a bunch of portable SIMD crates and I'm not sure how they compare to each other, but that could be something to look into if you need more speedup than rayon's task-based parallelism gives you.
Actually, you might not even need a dedicated crate. LLVM is smart enough to emit SSE instructions for the following code, when rustc is invoked with -C opt-level=3 -C target-cpu=native: