Glium window renders faster when resized to be smaller

I wrote an YUV video render in Glium, just as I did in C++ with OpenGL + GTK, using the exact same techniques: upload data to 3 pixel buffers: which end up in the 3 textures: Y, U, V, which are combined together in shader to produce an RGB image.

On C++ I get excelent performance, but on Glium I'm getting very slow performance, which gets better when I resize the window to be smaller, which makes no sense to me.

Here's the full example, in just one file, dependent only on glium: https://github.com/lucaszanella/glium_yuv_render_benchmark/blob/2d2ea2065807a5f7a3473d4997a5b3ba15693a23/src/main.rs you can clone the project and just cargo run without any difficulties.

In this example I made a simple ffmpeg buffer simulator which is empty, and gets rendered green on screen which is exactly what an empty buffer should be rendered from YUV to RGB.

Here's a pseudocode summary of what I did:

event loop wake thread:

//wake up event loop
event_loop_proxy.send_event(());
//should wake up event loop 200 times per seconds, which is sufficient for rendering in high frame rate
sleep_for_5ms();

event loop thread:

//gets a new ffmpeg simulated frame (with blank data) of size 1920x1080
let frame = consume_frame();
//Reconstructs or constructs the pixel buffers and textures if they dont exist or changed resolution
parse_frame(&frame);
//uploads Y,U,V planes to pixel buffers, and uploads to textures from pixel buffers
draw(&frame);

draw function:

fn draw(&self, frame: Frame) {
    //upload Y plane of frame to Y pixel buffer
    //upload Y pixel buffer to texture
    //upload U plane of frame to U pixel buffer
    //upload U pixel buffer to texture
    //upload V plane of frame to V pixel buffer
    //upload V pixel buffer to texture
    
    //This part of the code is what seems to take longer:
    //--------------------------------------------
    let uniforms = uniform! {
        tex_y: self.y_texture.as_ref().unwrap(),
        tex_u: self.u_texture.as_ref().unwrap(),
        tex_v: self.v_texture.as_ref().unwrap(),
        tex_format: 0 as i32,
        alpha: 1.0f32
    };

    let mut target = self.display.as_ref().unwrap().draw();
    target.clear_color(0.0, 0.0, 0.0, 0.0);
    target
        .draw(
            self.vertex_buffer.as_ref().unwrap(),
            self.index_buffer.as_ref().unwrap(),
            self.planar_program.as_ref().unwrap(),
            &uniforms,
            &Default::default(),
        )
        .unwrap();
    target.finish().unwrap();
    //--------------------------------------------

}

In my computer I get an average of 60 fps when the window starts, and if I make it smaller I can get 200 fps. I'm on a VM which has no GPU.

When I try to decode real video and render on it, I get 8 fps average, but VLC can render 60fps 1080p without any problems on my computer. Since I'm doing color conversion on shader, I don't see why I couldn't achieve the same in my example.

Did you compile in release mode?

yes, I get the exact same problem and performance when resized

Which of these do you run in VM with no GPU - C++ code? glium example? VLC?

all of them are in the same VM with no GPU

VLC can render 60fps 1080p. I'm using ffmpeg to decode and YUV to RGB conversion on shader. I think VLC does something similar or maybe it does color conversion on software.

Anyways, ffmpeg alone can decode 1080p 60fps video at 120fps average on this same VM, so it's not the problem. Anyways, the example does not use ffmpeg, it's just the renderer alone with a simulated ffmpeg blank frame.

I've tried reproducing this without success.

Configuration: QEMU VM with AltLinux 8.2 onboard, 4G RAM, 4 virtual cores; the host CPU is a Skylake i7 running at 3.1 GHz according to i7z.

Process: install Rustup and latest stable Rust; add this file to a new project with glium as a dependency; cargo run --release.

Result: 57 to 65 FPS when maximized (1920x1080 minus the panel and titlebar), more when resized (up to about 600). The CPU consumption when maximized is about 330% (so it uses all 4 virtual cpus).

it looks like you reproduced it right.

I think I'm understanding what's happening.

I opened my VLC on a 60 fps video and I assumed it was okay, but now I opened the media information and discovered it as dropping some of the packets (2000 renderized, 1000 dropped). So it looks like it was rendering with less than 60fps but it was smooth so I assumed it was 60fps. Then I resized VLC and it stopped losing packets, which confirms VLC does the same thing as glium, it renders in the screen size, not in a fixed resolution with scaling.

It looks like the texture size doesn't matter. I changed the sizes of the textures on my example from 1920x1080 to 1280x720 and it didn't affect anything. It looks like OpenGL simply samples less vertex when you're trying to render in a slower resolution, so it goes faster. The texture size itself has not much influence over it, it's simply a matrix of bytes, but OpenGL samples more or less from it depending on the window resolution. It always samples the same for the same resolution, so the texture size does not matter. What would matter is the time taken to upload the texture to the GPU, which is minimal for this case.

I think I also though my render was much slower because instead of dropping frames, I'd render them all, so a 60 fps video at 10fps would take 6 seconds to reproduce 1 second of video. While if I dropped frames I'd see a video with normal velocity but with 10 fps only.

However your render looks way faster than mine when maximized. My setup is a Debian 10 on a Xen VM (under Qubes OS), 3 virtual cores, on an i7 with only 2 cores/4 threads. Maybe the problem is my low host core count, which matters in virtualized environment as OpenGL is simulated.

Anyone knows if it's possible to render at fixed resolution on Glium/Glutin but scale according to window resolution?

Anyone has any tips? I think I'm rendering video at the fastes way possible: I'm using pixel buffers, and I'm feeding YUV data to the shaders for GPU color conversion. Don't think of a way faster than this.

I'm pretty sure that if you don't pass a GPU to your VM (and don't provide a passthrough 3d engine), the shader execution, color conversion and texture sampling happen on the CPU too. Likely to be (un)pretty slow.

yes! But when the resolution is 1920x1080, it'll sample 19201080 vertex data, and so it'll convert 19201080 colors. Not exactly 1920*1080 but you get the idea

When the resolution is 640x480, it'll do the same but for 640*480 vertex data