Thermite SIMD: Melt your CPU – Early Announcement and Feedback

novacrazy · December 9, 2020, 12:44am

Thermite

is a new SIMD library focused on providing portable SIMD acceleration of SoA (Structure of Arrays) algorithms, using consistent-length SIMD vectors for lockstep iteration and computation. Extensive research and work has gone into minimizing wasted CPU cycles and making the most out of what your CPU can do.

I've been working on Thermite for a little over a month now, and with the AVX2 backend and vectorized math library almost fully implemented, I think now is a good time to announce the crate and ask for feedback. Pre-AVX2/WASM/ARM backends are a work in progress.

The latest documentation is at https://raygon-renderer.github.io/thermite/

What would you like to see in an ideal SIMD framework? What can be done better in Thermite? What would be required to use Thermite in your number-crunching applications?

alice · December 10, 2020, 8:57am

Do you have some examples of the kind of things I could compute using this?

novacrazy · December 10, 2020, 10:32am

It's designed for SoA algorithms, where you aren't meant to care how many values are operated on at once, be it 1 or 4 or 16. It's kind of similar to an ECS (Entity-Component-System) in that you can zip along your data and apply changes, 1-to-1.

For example:

use thermite::*;

pub struct Position2D<S: Simd> {
    pub x: VectorBuffer<S, Vf32<S>>,
    pub y: VectorBuffer<S, Vf32<S>>,
}

pub struct Velocity2D<S: Simd> {
    pub x: VectorBuffer<S, Vf32<S>>,
    pub y: VectorBuffer<S, Vf32<S>>,
}

pub struct System<S: Simd> {
    pub pos: Position2D<S>,
    pub vel: Velocity2D<S>,
}

impl<S: Simd> System<S> {
    pub fn update(&mut self, dt: f32) {
        let dt = Vf32::<S>::splat(dt);

        debug_assert_eq!(self.pos.x.len(), self.pos.y.len());
        debug_assert_eq!(self.vel.x.len(), self.vel.y.len());
        debug_assert_eq!(self.pos.x.len(), self.vel.y.len());

        // this is verbose, but I'm working on better iterator APIs
        let px = self.pos.x.as_mut_vector_slice();
        let py = self.pos.y.as_mut_vector_slice();
        let vx = self.vel.x.as_vector_slice();
        let vy = self.vel.y.as_vector_slice();

        for (((px, py), vx), vy) in px.iter_mut().zip(py).zip(vx).zip(vy) {
            *px = dt.mul_add(*vx, *px);
            *py = dt.mul_add(*vy, *py);
        }
    }
}

and that will apply the velocities to positions in any number at a time, depending on the instruction set used. AVX2 will compute 8 at once, for example.

system · March 10, 2021, 10:32am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
SSIMD 0.1.0 : Simulated SIMD for Stable channel announcements	5	1152	January 12, 2023
Seeking recommendation on SIMD learning materials 🙃 help	6	170	May 31, 2025
SIMD vs Scalar Nbody help	3	477	March 16, 2021
Blog post: using SIMD to speed up dynamic programming algorithm	2	1253	January 12, 2023
Are there any comprehensive primers on Rust SIMD? help	1	271	July 11, 2022

Thermite SIMD: Melt your CPU – Early Announcement and Feedback

Thermite

Related topics