Soundness of transmuting immutable references to mutable when the underlying data is truly immutable

I am working on embedded and for semantic reasons, the author decided to use &mut self on a function, BUT the function never actually mutates self. The &mut self reference is actually converted to &self inside the implementation anyways. Since self is never actually mutated, any optimizations that the compiler makes given that self is immutable should still hold, correct?

It's definitely unsound. You can trivially end up with an aliased &mut. Note that &mut guarantees exclusivity, and violating that exclusivity is UB.

Moreover, it's instant UB to perform that conversion:

  • Transmuting an & to &mut is Undefined Behavior. While certain usages may appear safe, note that the Rust optimizer is free to assume that a shared reference won't change through its lifetime and thus such transmutation will run afoul of those assumptions. So:
    • Transmuting an & to &mut is always Undefined Behavior.
    • No you can't do it.
    • No you're not special.

(And not just due to the mutation implied in the above quote.)

I was going to go to further lengths to check that this is still considered UB in Miri in the case where you convert to &mut and immediately cast back to &, but it turns out I don't have to: rustc itself now lints the transmute[1] and lets you know that it is UB, even if the &mut is entirely unused.[2]


Wording nit: The "underlying data" has no inherent immutability.


  1. or cast ↩︎

  2. And Miri does still agree, if you allow the lints. ↩︎

8 Likes

Gankra out there keeping it real :smile:

5 Likes

If this is a trait function that you're trying to implement, you may be able to implement the trait for shared refs:

impl Trait for &A {
    fn f(&mut self) { ... } // this takes `&mut &A`
}

If it's something else and you need to generate &mut when all you have is &, then you need interior mutability.

2 Likes

Thanks for the demo, that lint is very cut-and-dry.

Just out of curiosity, why is it UB if the reference is never used? Rust Playground
rustc and MIRI complain, but it works. (Don't worry, I'm not going to use this, I'm just curious).

Don't think about &mut as about mutable unique references. It's mutable unqiue reference instead. The whole point of it is to ensure that there are no other accessors thay may mutate something behind your back, mutability is just an added bonus.

There was even an attempt to change syntax to use &uniq, but that haven't happened.

Why is it an UB to drive on red light? Why is it an UB to work on high-voltage equip without ensuring that no one may power it on?

The whole point of exclusive reference, it's purpose, it's raison d'être is to pass the information to the compiler that this particular object at this particular time may only be accesses using that one particular reference and couldn't be modified in any other way.

That means that it's not true that “reference is never used”: both unique and mutable references are always used to pass that information about lack of any other references to the compiler. Creation of shared reference promises to the compiler that there would be no other unique references, creation of unique references means that all other ways to access the object are quiescent in that point in program.

In very general terms, because UB is defined at the language level, and Stone Cold the language says so. I recognize that's a completely unsatisfying answer.

On a more practical level, the compiler might inline, rearrange, and optimize code based on the memory in question being non-aliased in some way that misbehaves when it is, in fact, aliased. I'm afraid I don't have a specific example for you, but I'm thinking of things along the lines of this example about provenance. (And here's some more words about aliasing &mut being UB.)

Does the compiler -- including LLVM and any other tools it uses -- actually exploit the strong properties of &mut to perform such optimizations today if the reference is never used? Maybe! I don't know! But it has reserved the right to do so.[1] Which brings us to...

UB is not defined in terms of "the compiler outputted something I like (today, with this exact code input and flags and so on)". Reasoning about UB in terms of the generated machine code -- that is, the machine code you expected to be generated -- is the wrong approach, because UB is defined at the language level. Once you hit it, all bets are off; you can't count on something predictable happening. You can't count on "whew, got away with it" being reproducible.

(Reasoning in terms of the output is a common mindset in some C/C++ camps, which is understandable given those languages, their compilers, and the history of both. Usually implicit in the mindset is that the compiler has some straight-forward, predictable translation from input code to machine code -- a translation that must basically hold across changes to the input code and across compiler versions, etc. Some type of "platform-specific UB" -- and if you know the translation, it's "not really UB". But UB has been defined at the language level for those languages as well, for some years now; moreover, their compilers do exploit "non-platform-specific UB". Here's another blog post related to that.)

So even if nothing in the compiler actually exploits the UB of this specific example today, that's no guarantee that nothing will do so tomorrow.

More generally, since the compiler can produce anything in the face of UB, producing something that does what you wanted is a possibility. But that doesn't mean you'll get the same result on any other invocation of the compiler (different input code, different flags, different version, whatever).


It's also possible the particular case of "made a &mut but never used it" will be relaxed some day, for example maybe &muts won't "activate" until the first time they are read or such. Unless and until the teams define something more lenient, though, it's UB and the compiler reserves the right to output garbage artifacts, etc.

I will note that one thing the current "any such transmute is instant UB" rule has going for it simplicity. If the rules were more complicated, you might not have gotten such a straight-forward answer, and might have ended up doing something you thought was sound but wasn't.


Sorry that was so many words, I failed to come up with a concise way to convey my thoughts. I'll just close with a note that there's a cultural component in addition to the technical considerations: In Rust culture, UB and unsoundness are generally just not tolerated. This is true even if it happens to compile the way you hoped today, and even if you can't give a technical justification for why something is defined as UB.


  1. It's even possible for Rust compiler devs not to know; e.g. if an LLVM noalias attribute is emitted, LLVM also reserves to the right to exploit the properties of that attribute. ↩︎

14 Likes

I'm using mutexes, and due to the way the code is set up I know for a fact that reference is unique.

I appreciate your response, I realized that I was defining "soundness" in terms of whether the generated code was correct and not whether it touched on UB (the C/C++ way). "That's how the spec is" is a satisfactory answer as for why it's UB.

Regarding correctness, it appears that the generated code is correct. As long as you make the same guarantees as to uniqueness and immutability, as I see it the only way future versions will break is if those guarantees change in the language spec (unlikely). I guess that was what I was trying to get at, moreso poking at the compiler internals than anything else.

The reason that this is an attractive alternative to UnsafeCell is because UnsafeCell tells the compiler to skip out on those optimizations, meanwhile those optimizations are still valid.

I think the most likely way to get some kind of runtime fault is if the compiler decides that since this &mut exists, and its origin cannot be identified, the & you were just using must not exist, so other functions that try to use it could be eliminated or modified. Also, if you have UnsafeCell behind the &mut, the compiler could optimize the UnsafeCell as if it's just a normal value, since this must be the only thing accessing it.

The best solution would be to copy this function into your own code so you don't have to deal with this at all.

Note that in my opinion/experience, you'll rarely get a satisfying answer to "can I get away with it" type questions on a Rust forum do to the cultural atmosphere:

people are a lot less interested in "getting away with" unsoundness compared to C, and thus don't know or necessarily even care when you can do it, because "you shouldn't".

(I don't personally have a problem with such exploration for private use. Anything exposed to the public is another matter.)

((Also, UB is defined on the language level, not on the results. "Compiled to something I'm ok with" is not a lack of UB. It's a subset of the possible outcomes of UB. This is as true in C as it is in Rust, though a lot of people are in denial. Threads like this would go easier if we had a curt and distinct word for "compiled to something I'm ok with", but I don't have a good suggestion.[1]))

To me, this sounds like you're still thinking along the lines of the compiler having some predictable rules of transformation you can reason about. In this case, you're assuming it will only make certain changes based on specific guarantees about uniqueness and immutability. But nothing says it has to do that!

Let me try to sketch a story to illustrate: From a compiler perspective, the presence of unavoidable UB is a sign that code execution can't reach this point -- the programmer promised they'd never reach UB, in a sense. The compiler can take the presence of UB as a communication from the programmer that a given situation is unreachable, and the compiler is happy to optimize based on that information.

Now say you have some function that unconditionally does the transmutation. That's UB, so the compiler is within its rights to conclude that the function is never called. Every call site must also logically be UB, and further semantic and probably unpredictable changes are made to your code, propagating up the control flow.[2]

Note that no guarantees about uniqueness and immutability changed in this story. The compiler just found a way to optimize based on things which are (already, currently) UB.

(Even if the compiler did have to base their relevant output based on some rules of uniqueness and immutability, you're assuming you can predict how everything changes in the future; not just the rules, but how compilers utilize them. Compilers don't have to do that, so it's moot, but I don't believe *anybody* has that ability.)
  1. I dislike "miscompilation" because the compiler has done nothing wrong. ↩︎

  2. Or it could just delete the body of the function, or heck, effectively alias it with some random other function, since you communicated it could never be called anyway. ↩︎

2 Likes

I confess I am puzzled by the current definition, and don't understand the purpose of the current rules. I do expect the current rules ARE justified, but still, I don't understand the why. It seems more intuitive to define UB in terms of reads and writes to memory. In other words, the compiler can cache a value in a register read from a non-mutable reference, on the assumption that the memory cannot be changed by a mutable reference. It can assume the referenced memory does not change. But it doesn't HAVE to do this caching, so the result of doing a "sneaky modification" of memory that is meant to be immutable is undefined. Certainly this kind of UB is easy to demonstrate with a simple program.

But, as above, I expect I just don't understand the reasoning, which intrigues my curious personality. [ It is also possible I did understand at some point, but have forgotten! ]

Consider hoisting access above a loop: it's perfectly legal for the compiler to decide it's cheaper to write to a &mut once before the loop rather than many times inside. Does that mean that it's now UB before the loop if that was an illegal reference? Or can the compiler only hoist if it can globally prove that ref is valid, breaking many optimizations? (Also something that is rather nonsensical at the moment, as by definition all programs are assumed to be valid in the sense of either well formed or not well formed, never getting to UB - so you'd also need to introduce some other program validity state into the definition)

It's also fairly pointless: if you have a shared reference and want to make it a mutable reference then either:

  • That's how the library was written, and it may even depend on you not doing what you're trying to do for correctness: you have to fork/PR it and correctness is on your head, or
  • You're just trying to avoid paying for RefCell, and you're ok with using unsafe for performance: so use UnsafeCell, which is explicitly for the purpose of getting an exclusive reference from a shared reference (and flags this to the optimizer so it doesn't bork your program)
1 Like

Linguistically, perhaps malcompilation? (You can shorten it to Malcom!). But if the only criterion is that I don't like what the compiler's output does, most of my compiles would be malcompilations😅

Edit: sorry, you said "I am ok with", which would be a eucompilation, but that's less fun.

1 Like

That is the sort of caching assumption that the compiler is clearly entitled to make.

The best "thought-experiment" I can come up with is a hypothetical machine that has "unsafe-aliasing" somehow built into the hardware, and I guess that is what Miri is emulating ( maybe... ! ). I don't feel entirely satisfied by this explanation though. Forking a library because it returns an immutable reference which could have been mutable, well, it can be quite inconvenient.

Note that this is how std::hint::unreachable_unchecked() works— It’s not particularly special except that calling it is always UB, which lets the compiler remove any checks that lead to it being called.

3 Likes

I don't think you understand what UB even is. UB is just a contract between language designer and a programmer. It's not justified by law of nature, it's just a set of rules designed to balance needs of language developers and language users.

Asking why creation of references is UB makes as much sense as asking why turning on red is forbidden in Europe but generally permitted in US. Some countries decided one way, some decided the other way, it's as simple as that.

Take look on situation with zero-sized objects: Rust allows them but C++ disallows them and that's who memcpy(&a, &b, 0); is UB, but transmute<[i32; 0], ()> is not UB.

Your expectations are wrong. And unreasonable.

We do need some rules. It would be extremely bad if one group of developers would think that zero-sized objects don't exist (and you they may assume that they would never need to hit a corner case where index grows beyond usize if you double capacity of your array repeatedly: you run out of address space first) and other group of developers would assume they do exist (and thus would develop API which returns bunch of empty object on the assumption that they wouldn't be allocated in memory).

And because we do need rules we invent them and codify them. And that's it.

You may argue that it's “wrong” to allow to turn or red or “right” to allow turn on red, but when you are in UB you would have to follow one set of rules and Europe you would follow another.

My best go-to example when I try to explain that you shouldn't rely on the UB generating something predictable is that be || !be example. If that be variable is an integer one some would expect that this would always be true, it's simple logic, after all.

But… what if variable is not yet initialized? Seven years ago both Rust and C++ agreed that it's still gives us true.

But one year later C++ decided that be || !be is now false. While Rust still thinks it's true.

One more year and Rust also thinks it's now false.

But then, one year later that is reversed and it's now true, again (if you use new, idiomatic, approach with MaybeUninit).

And that result still holds for a few more years: C++ thinks be || !be is false, Rust thinks it's true or crash (depending on type of uninit you are using).

Today? Today it's now crash or true in Rust (depending on type of uninit you are using). Maybe you should track version of Rust and use different type of uinint, heh.

But in C++ it may be false or crash, too (can you spot the difference?).

Trying to understand what and how your program with UB would do is as fascinating as trying to imagine whether you would be hit from back or the side if you would treat “turn on red” rules using rulebook for the traffic code from wrong country.

And interesting trick, but please don't bring it into my code, I wouldn't appreciate all that hassle of trying to find out which of landmines you brought in is exploding this time.

3 Likes

Sure, but rules are not invented in a vacuum, they exist for a reason, to allow optimisations to be made, which allows our programs to run faster, and also to allow our programs to run on a different machines, with reproducible, well-defined results. I expect the rules are well chosen for good reasons, but that doesn't mean I know what those reasons were. It is even possible some of the reasons were not good, but I am unable to judge that, especially as I don't even know the reasoning, and it isn't explained!

Why? In my experience very few rules are “chosen for good reasons” if such reasons are not immediately obvious.

On the contrary: if some rule is “strange and an unusual” then, most likely, the reason for it's existence is simple “it seemed like a good idea at the time”.

As in: someone and a “random choice” when some choice was needed and the it stuck and became hard to change.

Here, most likely, the logic was the following:

  1. С99 have a restrict qualifier
  2. LLVM have noalias markup to handle it
  3. And &mut reference naturally aligns with that markup

Making the rule that creation of &mut is UB if any other non-quiescent way to access that same object exist is simple and obvious way to make &mut and LLVM noalias compatible thus this is the rule which was picked when it was needed.

And now, when that rule exists, one would need to justify it's change, because the simple fact that it existed for 10 years means it's hard to change it.

Well, if you do that then you don't get optimizations unless the code actually is known to read or write to memory.

You'll see this in C++ a bunch, where the compiler typically can't change

if (b) ++*p;

into

*p += (int)b;

so that it's branchless, because that introduces new reads and writes, and for all the compiler knows, maybe p is sometimes nullptr when b is false, or there's some synchronization keeping b only true in one thread at a time, or something.

Whereas in Rust, because references have rules that have to hold even if you don't dereference them, with r: &mut i32 it'd be legal to change

if b { *r += 1 }

into

*r += b as i32;

because the &mut is a proof that doing that can't introduce a data race or read out of bounds or ...

Similarly, people often expect that

fn foo(a: &mut i32, b: &mut i32) -> bool {
    ptr::addr_eq(a, b)
}

should be optimized to always return false, but if you change the rules to only hold if something is read or written, then that optimization is illegal, because foo doesn't.

11 Likes

Those examples do still depend on something being done with the result of an unsafe conversion. I think there could be a distinction between something that MAY lead to UB compared to something that IS UB. I suppose specifying what is and what is not UB could get complicated though. Still, what I am saying is not that the rules are wrong, but rather than I don't know or understand the logic behind the rules. Of course one kind of logic could be "this rule is simple, even if it may be overly restrictive."