SDL2 performance texture.update()

I'm new to Rust and used the sdl2 crate to visualize 2d spatial simulations.
I found some strange performance issue, which seems to be related to the function texture.update(). Running the toy code below, it takes about 16 seconds to visualize 5000 iterations in a 1600x800 window. Running comparable code in C, Go, or Julia takes only about 6 seconds. I have no idea where this performance regression comes from. But maybe I am doing something wrong? Or is there a reason that Rust is two times slower here than other languages?

Secondly, is this the idiomatic way for using SDL in Rust to performantly visualize 2d simulations? The idea would be to use a function like get_pixels (in the toy code commented out) to safe the simulation results in each simulation step in a Vector and then use texture_update() to update the texture with the new data.

use sdl2::{

const NX:u32 = 1600;
const NY:u32 = 800;

fn get_pixels(pixels:&mut Vec<u8>) {    
    for y in 0 .. NY {
        for x in 0 .. NX {
            let i = (y*3*NX + x*3) as usize;
            // put-in some interesting patterns
            pixels[i] = 0x00_u8;   // here: make the whole window green
            pixels[i+1] = 0xff_u8;
            pixels[i+2] = 0x00_u8;

fn main() {
    let mut pixels: Vec<u8> = vec![0; (NX*NY*3) as usize];
    let sdl_context = sdl2::init().unwrap();
    let video_subsystem =;
    let window = video_subsystem
        .window("Rust SDL Bench", NX, NY).build().unwrap();
    let mut canvas = window.into_canvas().build().unwrap();
    let texture_creator = canvas.texture_creator();
    let mut texture = texture_creator
        .create_texture_streaming(PixelFormatEnum::RGB24, NX, NY).unwrap();
    let mut event_pump = sdl_context.event_pump().unwrap();
    let rect = Rect::new(0,0, NX, NY);
    let mut iter = 0;
    'running: loop {
        iter += 1;
        //get_pixels(&mut pixels);
        texture.update(rect, &pixels, (3*NX) as usize).unwrap();
        canvas.copy(&texture, None, Some(rect)).unwrap();
        for _ in event_pump.poll_iter() {} 
        if iter == 5000 {
            break 'running

For questions about performance, we'll have a lot more to go on if you post the exact commands that you're using to benchmark the Rust code (and ideally the implementations in other languages, too).

Did you build your executable with cargo build --release? The release flag can oftentimes speed up Rust programs by one or two orders of magnitude.

1 Like

You can try using Texture::with_lock instead of update. (I haven’t done any perf testing, so I don’t know if it’s actually faster)

Yes, I used cargo build --release.

I didn't use any benchmark commands but just the shell command time. I found comparable performance in several simulation runs (I use Mac Os). Given about 2 times performance difference this seemed to be ok.

Here, for example the comparable code in Julia which takes about 6-7 seconds. I can also put in the comparable C-code if this helps.

using SimpleDirectMediaLayer
const SDL2 = SimpleDirectMediaLayer

const white = 0xff000000
const black = 0xffffffff

function getPixels!(pixels,nx,ny,cells)
  @inbounds for y=1:ny, x=1:nx
          if cell_state(cells,x,y)
             pixels[x,y] = white
             pixels[x,y] = black

function main()
    nx = 1600
    ny = 800

    win = SDL2.CreateWindow(
        "GOL", Int32(0), Int32(0), Int32(nx), Int32(ny),

    renderer = SDL2.CreateRenderer(
        win, Int32(-1),

    texture = SDL2.CreateTexture(
        renderer, SDL2.PIXELFORMAT_ARGB8888,
        Int32(SDL2.TEXTUREACCESS_STATIC), Int32(nx), Int32(ny)

    pixels = zeros(UInt32, nx, ny)
    SDL2.SetRenderDrawColor(renderer, 255, 255, 255, 255) #white
    rect =  SDL2.Rect(0, 0, Int32(nx), Int32(ny))

    step = 0
    for i=1:5000
        step += 1

       ev = SDL2.event()
       if typeof(ev) == SDL2.WindowEvent && ev.event == SDL2.WINDOWEVENT_CLOSE
               @show "CLOSE!", ev

       SDL2.UpdateTexture(texture, Ref(rect), pointer(pixels), Int32(nx * sizeof(UInt32)));
       SDL2.RenderCopy(renderer, texture, Ref(rect), Ref(rect))

@time main()

If your simulation state is also in some grid-layout vector, something like the following can be nice:

for (px, state) in pixels.chunks_exact_mut(3).zip(simulation.iter()) {
    // write rgb values derived from state to px[0],[1],[2]
1 Like

I have tried this also and didn't find a big difference

You’re using different pixel formats in the two versions: The Rust code is using 3 bytes per pixel (RGB24) and Julia is using 4 bytes per pixel (ARGB8888). Adding the extra byte may allow the optimizer to replace the byte operations with aligned u32 operations.

In fact, the Julia code is doing explicit u32 stores to write the pixel values, instead of writing individual bytes.

Edit: Since your simulation size is constant, I’d be tempted to write get_pixels like this and then use something like the bytemuck crate to get an &[u8] to hand to SDL:

#[repr(C, align(4))]
struct Argb8888 {
    pub a: u8,
    pub r: u8,
    pub g: u8,
    pub b: u8

fn get_pixels<W:usize, H:usize>(pixels: &mut [[Argb8888;W];H]) {
    for y in 0 .. H {
        for x in 0 .. W {
            pixels[y][x] = Argb8888 { r: 0x00, g: 0xff, b: 0x00, a: 0x00 }

(Note: I’ve not ever done this myself, so there may be a reason not to do this)


Thanks. I used RGB24 for the Rust version because I followed the example in GitHub rust-sdl2/examples/ But note, that the choice of the pixel format does not solve the performance issue. In both the Rust and the Julia example the function get_pixels is commented out (exactly for this reason, to eliminate influence of this function for measuring the render-performance).

I replaced the Rust code with ARGB8888 and a longer pixel vector (4 bytes per pixels),
i.e. changing the definitions of pixels and texture by

let mut pixels: Vec<u8> = vec![0; (NX*NY*4) as usize];
let mut texture = texture_creator
    .create_texture_streaming(PixelFormatEnum::ARGB8888, NX, NY).unwrap();

and in the loop

texture.update(rect, &pixels, (4*NX) as usize).unwrap();

But this still yields the similar bad performance (runtime slightly reduced to 15 secs, but this is still about double compared to the implementation in other languages).

Thanks, indeed this will help me to more elegantly glue my simulation-code to the rendering part.