Bad performance with Glium


#1

I’m just getting started with Rust and Glium.

I tried to render some triangles to the screen. I’m getting 8 FPS, and I’m only rendering 8192 triangles. What am I doing wrong?

This is my source code:

#[macro_use]
extern crate glium;

use glium::{glutin, Surface};

#[derive(Copy,Clone)]
struct Vertex {
    position: [f32; 2],
    color: [f32; 3],
}

implement_vertex!(Vertex,position,color);

extern crate rand;
extern crate time;

fn main() {
    let mut events_loop = glutin::EventsLoop::new();
    let window = glutin::WindowBuilder::new().with_fullscreen(glium::glutin::get_primary_monitor());
    let context = glutin::ContextBuilder::new();
    let display = glium::Display::new(window, context, &events_loop).unwrap();

    let mut shape = Vec::new();
    for _ in 0..24576 {
        shape.push(Vertex{
            position: [rand::random::<f32>(),rand::random::<f32>()],
            color: [rand::random::<f32>(),rand::random::<f32>(),rand::random::<f32>()]
        });
    }

    let vertex_shader_src = r#"
        #version 140

        in vec2 position;
        in vec3 color;

        out vec3 v_color;

        void main() {
            v_color = color;
            gl_Position = vec4(position, 0.0, 1.0);
        }
    "#;

    let fragment_shader_src = r#"
        #version 140

        in vec3 v_color;
        out vec4 color;

        void main() {
            color = vec4(v_color, 1.0);
        }
    "#;


    let vertex_buffer = glium::VertexBuffer::new(&display, &shape).unwrap();
    let indices = glium::index::NoIndices(glium::index::PrimitiveType::TrianglesList);

    let program = glium::Program::from_source(&display, vertex_shader_src, fragment_shader_src, None).unwrap();

    let mut begin_stamp = time::precise_time_ns();

    let mut running = true;
    let mut count: u64 = 0;
    while running {
        count += 1;
        let mut target = display.draw();
        target.clear_color(0.0, 0.0, 0.5, 1.0);
        target.draw(&vertex_buffer, &indices, &program, &glium::uniforms::EmptyUniforms,
                    &Default::default()).unwrap();
        target.finish().unwrap();

        events_loop.poll_events(|event| {
            match event {
                glutin::Event::WindowEvent { event, .. } => match event {
                    glutin::WindowEvent::Closed => running = false,
                    _ => ()
                },
                _ => (),
            }
        });

//        if (time::precise_time_ns()-last_print) > 1000000000 {
//            last_print = time::precise_time_ns();
//            println!("Loop time is {} ns", time::precise_time_ns()-last_loop);
//        }
//        last_loop = time::precise_time_ns();
    }
    println!("Loop ran {} times",count);
    println!("Total elapsed ns: {}",time::precise_time_ns()-begin_stamp);

}

Any help is very much appreciated.

I’m also new to the forum, so any advice concerning forum etiquette would also be great.

Thanks!


#2

First, easiest, question:

Have you compiled it in release mode? That is, with cargo run --release?


#3

It didn’t work, unfortunately; I still only got 8 FPS.


#4

You are rendering 8,192 large triangles at full resolution. If you use a smaller window (instead of full screen) or render to a smaller frame buffer and scale that to full screen, you will see it draws much faster.

The problem is that this example code is fill rate limited. Games don’t generally draw 8,000 triangles over top of one another (because that is wasted GPU effort). They rely on a depth buffer to help trim triangles that would otherwise be drawn over later anyway.

If you enable the depth buffer as described here your example will instantly run 3x faster. If you also give each triangle a random Z coordinate, you’ll get another 2x speedup for about 6x total overall better performance. This isn’t necessarily a huge help, considering that a 6x improvement to 8 frames per second is only about 48 fps. But it is an important lesson in 3D rendering optimizations.

Keep in mind also that these triangles are not representative of useful content in practice. The geometry of a model with 8,000 triangles is not going to have thousands of them largely overlapping. Instead, they will all be much smaller, with shared edges. Smaller triangles draw faster, and additional features like the depth buffer and back-face culling will prevent the GPU from drawing triangles that the camera will never see. For much large scenes with a lot of geometry outside of the viewport, almost all of that invisible geometry will be clipped long before it gets to the rasterization stage in the rendering pipeline.

I hope this was able to give a little insight into why a seemingly small number (8,192) of triangles is performing poorly in this example.


#5

I tried to run the code, and it performed well enough for me - the average time per frame was about 16.6 milliseconds, so 60 fps, and I assume it would have been more if it wasn’t for V-sync. This was on an old Linux machine with a Radeon HD 6870 (open source drivers), and by old I mean the GPU is from around year 2010.

Maybe for some reason hardware accelerated OpenGL is not available on the system, or for the program at least?


#6

I tried adding the depth buffer and the code was instantly boosted to 55 FPS, which is much better.

Thanks for your explanation of the optimizations - I’m relatively new to graphics programming, so this is extremely helpful :smile: