A curmudgeon talks about performance

I am ready to scream. I just read the umpteenth post about the performance of a particular construct without any evidence that there actually is a performance problem. People are asking if a loop is faster than using functional style; they are asking if using an array is faster than using Vec; they are playing all sorts of games trying to avoid allocations. Is that really the best use of their time? Reading a post to find out that’s what they are talking about is not the best use of mine.

I learned long ago not to worry about performance until I have a performance problem. Of course, I choose an O(1) algorithm over an O(N) one, but I don’t worry about saving a few machine cycles in a function that’s only called a couple of times. Writing maintainable code is far more important than shaving a few percent off your execution time (unless you’re preparing for a benchmark competition, of course). Besides, you (by which I mean I) don’t know what the slow parts are until you run a hot spot analyzer.

Sorry for the rant, but I’m frustrated by a bug I can’t figure out, and I just needed to blow off some steam.

7 Likes

I think it’s a lot more mundane than you make it out to be. Some of it is likely due to just inexperience, and I think that’s fine, and we should try to be patient with those folks. We already have our own experiences that have taught us things that seem obvious now, but every day, tons of folks are re-living those experiences learning what we learned all over again. What you’re talking about is judgment and it’s super hard to teach good judgment. In most cases, it just has to be earned, the hard way. Best thing to do is probably to nudge them in the right direction and let them learn for themselves.

Other than that though, I think it’s just a matter of context. At $work, I rarely care about subtle performance details and basically never have to look at codegen. If I need to allocate a bunch of memory to build a hashmap because it makes the problem I’m trying to solve easier, then I don’t really think twice about it, because it’s never going to come close to a bottleneck in our particular system. But in my free time, where I work on my open source Rust projects, I’m almost always fiddling with low level performance details and deeply worrying about allocation, to the point where entire APIs are designed around it. It’s because the work I do requires it, and to put it in your terms, folks are going to be using that code in ad hoc benchmarks all the time. Sometimes they’ll even publish it. But separately from that, a lot of the stuff I write is used because it’s fast, and you don’t get there by not caring about things like allocation and codegen. I suspect I’m not alone. Rust attracts people and circumstances like that, by its nature.

TL;DR - Some of it is inexperience. Some of it is just because people work on different stuff than you do.

27 Likes

+100 (The “whatever just make a hashmap” is so familiar…)

But part of the fun I have in Rust is making changes that make tiny differences, like

7 Likes

You were probably where they are now at some point in the past. There will always be a constant stream of developers who have yet to figure out that performance is usually part of the spec, not the goal. Let them learn and give advice.

I of course agree that you often don’t need to worry about performance. But the cases where do you need to make things faster can be quite frustrating.

Hotspot analysis isn’t always the answer. I often find myself in the situation where my code is too slow, but profiling doesn’t show any hotspots. Or once you optimize away the obvious mistakes that show up in a profiler, you’re left with a flat profile, and it’s hard to tell what to do next. Sometimes doing something slightly faster multiplied across your codebase adds up.

On the other side, sometimes code is slow for no discernible reason due to the vagaries of the optimizer. Performance optimization is hard, and I often feel like people who say “just use a profiler” haven’t actually tried to optimize anything.

2 Likes

I spent 10 years of my career tuning codes for benchmarks, so I know that you are right. It isn’t “just use a profiler” as much as it is “have you at least tried a profiler.”

My most interesting case was when I added a 16-element vector between the declarations of two matrices and got a 50% speedup. The hint was two sets of nested loops that took the same amount of time, e.g., a flat profile. Looking at the loops, it was obvious that one did much more computation than the other, so I knew it was a memory access problem. In that case, a flat profile was informative.

2 Likes

Once you hit that wall that any developer who uses a profiler eventually does, of course you have to become more creative. But for the people who are overanalyzing their code about hypothetical performance implications, “just use a profiler” is sound advice. If you try to go deeper with a person at this stage, you will confuse them more than anything

1 Like

I think there is merit in both the “optimize now” and “optimize later” approaches, and I use a mixture of both in my own programming practice.

The “optimize later” camp brings some very good points indeed, when they say that reasoning about performance is hard, and that it’s easy to sink time into insignificant details way beyond the point of diminishing returns. As others pointed out before, that’s fine as a hobby / learning exercise, but should probably not be done at work.

On the other hand, the “optimize now” camp is also right that fixing the performance of code after the fact can sometimes be way more expensive than fixing it at the time it’s written. Classic examples include suboptimal data layout and APIs, which either are subjected to a compatibility promise or effect the whole program, and therefore are very hard to change after the fact. Worse yet, such “ambient” / “omnipresent” bottlenecks can be very hard to detect using hot spot analysis tools, because they are not hot spots, they are everywhere (and usually inlined).

Thus, when I want to make efficient use of my programming time, I take an approach where I optimize the code a bit at the time where it’s written, bearing in mind the point of diminishing returns and perpetually checking that any emerging time sink is really worth the trouble. And then, once I have the full application running and can do more realistic benchmarks, I check the performance again to see if there’s something that I have forgotten, or a part that was really worth spending more time on, and go back to that.

7 Likes

Yes, there is such a thing as premature pessimization ! Particularly in the very design of APIs.

When dealing with larger systems, I tend to experiment on small programs to get a feeling for the costs for data layout, etc. Easier to think when the iterations take about a second rather than minutes :slight_smile:

2 Likes

This guy HPCs :rofl:. Using the word "codes" for "software" is a sure tell that @alanhkarp comes from a high performance computing background.

On the topic, @BurntSushi's comment is completely spot on. But Rust fits into the wider computing ecosystem as a high performance language. If performance isn't required then there's little point to use it over e.g. Python where you just want to crank out features. When performance is found wanting then we turn to Rust to reimplement the slow bits and increase transactions-per-second or reduce latency or reduce memory footprints to run on smaller hardware or smaller cloud footprints to reduce OPEX. So I expect a lot of people decide that if they're using Rust they better squeeze all the performance out of their code that they possibly can.

I think some of the culture also comes from the people involved in some of the higher profile projects. e.g. the people working on futures are rightly proud that they don't require any [heap] allocations. I think even C++ Futures allocate (when coming from std::promise)! But the idée fixe on allocation reduction from the Rust community feeds back to the neophytes who want to join in and also write face-meltingly fast code.

But as @BurntSushi suggests, as a community we need to guide people to let them know it's ok to allocate, to use Box, they usually don't need to inline Vec using ArrayVec, etc.

4 Likes

For people coming from Python, Ruby and JS, Rust's strong typing is also often mentioned as a strong point of differentiation (whether it's positive or negative depends on the person speaking). However, the Java family and C# are stronger-typed contenders.

1 Like

Don't underestimate the value of doing useless things.

Tuning the heck out of any random piece of code is a good way to learn the techniques needed for tuning performance. It's also a good way to learn the kind of judgement you're advocating for. "I spent eight hours removing every last allocation and my code is still slow" is a powerful motivator.

For that matter, learning to profile is just as important, but that's a learned skill, too.

6 Likes

Excellent detective work, @ehiggs. I was indeed tuning HPC software. It was great fun being paid to squeeze the last 2% of performance out of a benchmark. That was also when I saw that the required changes usually made the code far less maintainable.

I disagree that "they better squeeze all the performance out of their code that they possibly can." Is a week of an engineer's time worth a few percent performance improvement? Sometimes, but not always. Almost certainly it is when it's a library that will be widely used. Almost certainly not if the run to run variance is larger than the improvement, as often happens with distributed applications.

@cliff is also right. There's a time to experiment in order to build your toolkit for when you're doing something where performance matters. There's also a time to make your code as clear as possible and only worry about performance when you've identified a performance problem. One of the things I like about Iterators is that they are both clear and performant.

4 Likes