I’m a newcomer here – I’ve had a few pokes and prods at rust in the past but only in the last few weeks started a really serious dive. As a result I’ve come up with some unexpected discoveries about performance in rust, which I’m hoping someone can help me understand better.
(TL;DR this is not a “Help my program doesn’t work!” request; it’s a “Hmm, my program does something better than I expected and I’d like to understand why…” inquiry.)
One of the things that I tend to do with a new language is to see if I can port some existing small project of mine. In this case I decided to port a little simulation back from my days in academia, the results of which are here:
The design is pretty simple – there’s a bunch of vectors that get instantiated up front and then various functions operate on these as mutable buffers in an inner loop (for anyone interested, the simulation is of co-evolving reputation measures for objects and users based on ratings given; the original was written to calculate results for this paper: https://arxiv.org/abs/1001.3745). In terms of functionality and performance all is good.
However, while some of those vectors are genuinely output of the calculation, others are really just buffers used to store the results of under-the-hood calculations. So for fun, I decided to see what would happen if I just encapsulated those vector instances inside the scope in which they were really needed, and generated them afresh inside that scope for each use. The results are in this branch:
What I was expecting was that this would result in the vectors being reallocated multiple times inside the inner loop, and therefore kill performance. What I found instead was that performance was not affected at all; the program was just as fast.
My first thought was that since the lengths of the various vectors is a constant known at compile time, maybe the compiler is able to be smart enough to use that to recognize that a single fixed-size buffer can be instantiated under the hood and reused rather than reallocated. So I added an extra patch to make the vector sizes runtime parameters passed via command-line arguments:
… but again there was no performance impact. So at this point I’m stumped; the compiler is obviously doing something smart that avoids unneeded reallocation of vector instances used inside the inner loops, but I don’t understand why or how it is able to do this. (One thought is that since the vectors are relatively small in size – 1000 elements – maybe the compiler is able to recognize that they can be allocated on the stack, but I’m not sure how to confirm that.)
So, if anyone has any thoughts or ideas on why rust is able to achieve these performance outcomes, I’d be happy to hear. (It’s nice to be able to submit a help post asking why things are unexpectedly good, rather than something not working…)
Thanks & best wishes,