I'm having some trouble trying to get Rust to optimize a dereference that SHOULD be able to be hoisted out of a loop, involving dereferencing a double reference (e.g. &&u32). In my actual code, using a double reference is unavoidable since the inner reference is part of a larger struct, but this example suffices to show the problem:
I would hazard a guess that this is due to better handling of alias analysis on function arguments than local variables (which would explain why double_deref_loop_manually2 is optimized but double_deref_loop_manually isn't), but I'm wondering if this is expected behavior from the optimizer and in general if there's a good approach to work around this? Just try to write a helper function that doesn't have any double refs (like how double_deref_loop_manually2 calls deref_loop)?
This gets into “not yet formalized” territory, but my understanding that there's a specific rule in the compiler that says when a reference is passed to a function,
fn foo(x: &Bar) {...
and the type Bar contains no UnsafeCell, it must be the case that x points to a valid value of Bar, whose bytes won't be mutated as long as foo() is running; but there is not (yet) that same rule for references which are manipulated within the function, nor is it transitive through multiple levels of dereferencing (yet).
So, what you are seeing is not technically a missed optimization — it's not that the analysis isn't thorough enough — it's an optimization that there isn’t yet any rule permitting.[1]
Personally, whenever I meet a &u32, much less an &&u32, I try to write code that dereferences it as soon as possible, perhaps even inside of the pattern that binds it, so that I don't have to use the * operator later, and so I have the least possible chance of operating on a wasteful pointer-to-u32 longer than necessary.
pub fn double_deref_asap(&&x: &&u32, foo: fn()) -> [u32; 4] {
// x is of type u32 now
...
(Though this comes up more in matches and closures than named functions.)
Ah I understand, that's a bit disappointing. In fact it seems the Rust reference does state that this rule is "a subject of some debate". Essentially the consequence of what you're saying is that some optimizations like alias analysis are only ever applied to function arguments, and otherwise are completely disabled?
This case seems especially unfortunate. To some extent I can imagine arguments why the "references point to valid data" requirement could be relaxed, but here we're talking about the much narrower case of "successive safe dereferences of the same reference will return the same data", as in double_deref_loop_manually. Furthermore, in this case, this is a reference derived from data that WAS passed via a function argument; so alternatively, it would even be sufficient to say that the "function arguments must be valid" rule propagates to interior references (ie. is "transitive" as you mentioned). I imagine this pattern is very common: e.g. methods taking &self, on structs containing any interior references. What you're saying is you'd need to write a helper method that takes the interior references directly, since &self doesn't guarantee that those references are valid?
For the tiny fraction of all (unsafe) Rust code that needs this guarantee, surely this is something that could be made opt-in rather than opt-out? Especially since patterns like &self are so ubiquitous, and this blocks significant optimizations like loop hoisting. Plus, at least in my understanding, hasn't the rule always been "creating a reference to invalid data is insta-UB", so there shouldn't (in theory...) be any code out there that breaks significantly from this?
At a minimum, what is the argument against making the "function arguments are valid" rule apply transitively?
Sorry, but I don’t have the knowledge to to comment on these matters. Perhaps someone else will reply, or you might find more information or the right people on IRLO or Zulip.
This may have to do with the lambda capture by-reference (apologies for my C++ speak), so you effectively have triple reference. Which, as you pointed out, may impact the alias analysis. You can help it a bit with a by-value capture:
Yes, of course. I'm just generally trying to figure out and recognize what patterns seem to confuse the optimizer. Certainly, with performance in mind it's always to manually optimize the code by hand as much as possible, but that's often at the cost of readability/safety. So, recognizing these patterns is valuable, especially when troubleshooting "why is this function so slow?".
Understanding things like this will definitely influence how I write performance-sensitive code.
(I of course understand that the optimizer can be brittle, and is subject to frequent change).
Ah, that didn't even occur to me! Great point. I suppose closure captures work similarly to function arguments in a way, so this makes a lot of sense.