Guidance For Profiling and Benchmarking for Bevy Game Engine

Greater context is in the above discussion, but essentially when working on bevy I found that ecs_bench micro-benchmarks would return drastically different results upon extremely minor changes to the benchmarks, such as commenting out unrelated pieces of code. To test the performance of bevy in more real-life situations I wanted to create some headless "games" that work essentially like a real game, but they run themselves and don't rendering.

For example, I just finished an asteroids game that has the semblance of a real game from the system logic perspective, but doesn't require user input or rendering ( rendering is optional ):


I'm thinking of making at least one more game like this and probably using the bevy breakout example as well, but now I want to know what the best way to profile and analyze the performance of Bevy using this game is.

I've setup Linux perf and valgrind and their GUIs Hotspot and KCachegrind, but I don't really know how to use them. Also, on Linux ( using perf ) the asteroids example can print out the number of CPU cycles and CPU instructions that have been run over the execution of the game, which seems like a useful metric:

cycles / instructions: 4.05076 M / 2.42554 M (1.67 cpi)

We might be able to do other similar things.

Is the best way to test this just to run the game for a certain number of frames and time how long it takes? What kind of strategies can I use to approach this?

The goal is to be able to more effectively determine the effect that changes to the bevy engine have on engine performance. We need to be able to compare one version of bevy to other versions of bevy and measure the difference in performance.

I'm imagining a nice workflow would be if we could script out a benchmarking suite that runs through a few of these benchmark games and collects a certain set of stats on them and displays the difference in the stats between two runs of the suits. I know that criterion does this, and maybe that would be useful for the timing portion, but we might also want to do the same with CPU instruction counts.


Anyway, any guidance or tips would be appreciated, thanks!

3 Likes

I’d try to capture the distribution of frame times somehow: a change that makes 99% of frames a little faster and the rest take twice as long will look reasonable in an average but could be a worse experience for the player.

One way to do this is to track something like the 99th percentile frame time, to see what impact you’re having on the worst-case frames. Another would be to pick a target simulation update rate and track how strictly the engine can maintain the expected schedule, with penalties for both early and late frames.

1 Like

@2e71828 That seems like a good idea, but I'm not sure if it will be too slow to measure and record individual frame times when running the game without graphics as the iteration times are ridiculously fast and the overhead of measurement might make the measurement themselves less useful.

We'll probably have to do something like criterion does when running the games where we run them for a certain number of iterations, take a sample of how long that took, then take a certain number of samples. Actually we can probably just use criterion to do that part of the benchmarks.

I don't want to use the typical criterion workflow for everything, though because I want to customize the workflow a bit. I'm working on creating my own benchmarking harness so you can just clone the repo and run cargo run and it will go through, build each example as a separate executable, run each example and collect the metrics from each including the frame timing, the cpu instructions/cycles, and any other metrics I can manage to collect that might be useful. Then we can render a web page with the metrics or just print them to the terminal.

I'm probably just worrying about making it run on Linux for now so that we can use perf to get the metrics.

Well, now I've got some cool benchmark graphs :slight_smile: :

I essentially just run a bevy game for 300 frames, get the frame time average, the CPU cycles, and the CPU instructions, then I do that over and over again to collect 200 samples, which I then plot on these graphs and compare to the previous benchmark.

Now I've just got to add more benchmarks and then start testing different changes to Bevy.

1 Like