Escape analysis

I have a question inspired by Ofek Shilon's CppCon talk. Can rustc assume that the pointer x passed to

fn f(x: &Cell<i32>)

does not escape, since the function is quantified over all lifetimes?

1 Like

I think GhostCell and LocalKey take advantage of something like this. Does the compiler actually use this to elide loads (if it's sound, which it seems like it should be)?

Rust can assume the reference doesn't escape -- last longer than the call -- yes.

Other data can escape the call though, including the address of the Cell during the call, so I'm not sure what all you can rely on or not (and is probably not formally decided). For example, you could store the address globally, and then have another (unsafe) function that accesses the Cell with the safetly preconditions:

  • You're on the same thread
  • No intervening calls
  • The &Cell you last stored is still valid

If you call this unsafe function having satisfied the preconditions, is it valid for it to reconstruct the &Cell<i32> and rely on the value of the i32? If so, escaping still occurs.

Example I threw together out of curiosity. (The fact that it seems to work doesn't mean it's guaranteed to though.)

I can't imagine a way for the escape to happen that doesn't involve unsafe, but I'm also not sure if the compiler would be able to exploit that here or not.


Do such considerations ever result in eliding loads, I don't know. That might even be more of an LLVM question. Perhaps the better question is "do the LLVM equivalents of pure, const, noescape get emitted?"

There's few guaranteed optimizations, and that question (or questions about what escaping considerations would be considered valid) might be better suited for compiler devs on Zulip or perhaps on IRLO.

2 Likes

As far as I'm aware, lifetime annotations in types should be irrelevant for the actual program behavior, and they just exist for the borrow checking and type checking to be feasible and sound. (Don't quote me on that though, I'm not sure where exactly I got the from. In my opinion, it's sensible though, considering lifetimes being "erased" during compilation.)

Regarding the code testing this in Miri: you might as well simply transmute the &Cell into &'static Cell, no need to use a pointer; in my mind, using references directly in the Miri test seems even more likely of an indicator that this kind of code is not UB.

3 Likes

Maybe that blog post:

The compiler erases lifetime information prior to monomorphization and code generation, meaning that the generated code simply has no way to depend on lifetimes. That could be changed, but we’d have to work hard to avoid code blowup by generating separate copies of code for each lifetime it was used within, assuming that the behavior didn’t change.

IOW: it's not something that is part of Rust language specification, but it's true for the existing compiler and while the ability to use lifetimes for code generation has pluses and minuses, currently the decision is not to use lifetimes for that.

I spent awhile on dead-ends exploring how maybe a fn foo(&Cell<i32>) -> i32 could imply noescape; I'm convinced now that's impossible from the API alone (though it could be an optimization based on the body). The exploration is a dead-end because there's no firm connection between the validity of a raw pointer and the lifetime of the reference you created it from.

However, "lifetimes don't effect program behavior" is implicitly a statement about well-defined programs. The question then becomes, are the preconditions I listed adequate to avoid UB? If not, the behavior cannot be guaranteed, naturally.

If we take this documentation to be normative, my program is UB -- references (not raw pointers) have been used to access the underlying value, before I ever use the stored raw pointer. [1] The fact that a Cell is involved is also irrelevant in my reading (the memory of the Cell is not itself within an UnsafeCell, notionally, even though it's the same memory span). [2] Also nothing on that page mentions UnsafeCell or interior mutabilty at all.

That said I think it could be rewritten to confirm to that documentation, i.e., perhaps be well-defined.


I think transmuting to &'static is different that *const in that having a & to an invalid value is UB even in unsafe code; if you create a &'static T, unsafe code could never soundly temporarly put some invalid bit pattern in that memory.

(I spent almost no time thinking this one through more thoroughly though.)


  1. Ironically in no small part due to steps I took to be explicit about intention, making the reference lifetime being valid for every call to whateva (even though the call to some_func was a reborrow anyway). ↩︎

  2. I haven't thought about or looked for citations to see if it could be relevant for memory that is notionally within an UnsafeCell, like the i32 itself. ↩︎

1 Like

I assume you mean

  • The result of casting a reference to a pointer is valid for as long as the underlying object is live and no reference (just raw pointers) is used to access the same memory.

Yeah, that sounds off. Sounds correct for me when talking about mutable references, but for shared references and read-only (and in this context, I’d also count turning a *const Cell<…> into &Cell<…> and then writing to it through the Cell API as “read-only”) pointer access, this statement is probably[1] wrong / badly formulated.

AFAIR, depending on what you would or wouldn’t count as “having” (and what you consider “invalid”), that’s not set in stone yet, but I also agree that an approach using *const is more likely not to be UB. I mentioned the &'static-transmuting approach only, because in my experience Miri more reliably reports UB for code involving references than raw pointers.


  1. not to say obviously ↩︎

Ah, I see.

I agree the ptr documentation is lamentably poor, especially considering how it's innately unsafe to actually make use of them. I also found this for example:

This does not take ownership of the original allocation and requires no resource management later, but you must not use the pointer after its lifetime.

But pointers don't have lifetimes, and the references in the example have lifetimes that end immediately. Probably they meant "value liveness scope" of some sort.


This conversation only solidifies my stance that I can't guarantee it works by the way. It is all just so horribly underspecified.

2 Likes

I think it might have been this thread I had in mind for this statement, btw :slight_smile:

1 Like

Thanks.

I guess that would be sufficient for &'static Cell as it's !Sync so you can't observe it from thread B while it's invalid in the unsafe of a thread A, or so. (Didn't think deeply on it.)

This makes sense but I am not happy about it. It really seems that calling a function f quantified over all lifetimes with a reference x shouldn't enable modifications to the memory pointed to by x after f returns, but that is evidently not the case.

For a type like &i32 there is a strong immutability guarantee, so the compiler could be smart about reading it.

However, Cell contains UnsafeCell inside, which in Rust is a type for "anything could happen", so in this case I would not expect the compiler to elide any reads.

2 Likes

I think there's a misunderstanding about how lifetimes work in here somewhere.

Functions are generic over lifetimes, that's why they use <> (when not elided). fn foo<'a>(&'a Cell<i32>) doesn't mean each reference passed to foo has to be valid for any lifetime. Instead, foo can accept a reference of any lifetime, including the lifetime that ends as soon as foo returns.

In other words, it's not foo takes a reference that is valid for any lifetime 'a. It's foo<'a> takes a reference that is valid for the lifetime 'a.

But I don't think it really matters in the way you're thinking. Instead of doing escape analysis to determine what optimizations apply, Rust just won't let you write code that would be optimized unless it passes the borrow checker. The escape analysis happens in your head, while you're having a conversation with the compiler in the form of error messages. By the time you get to codegen, there's nothing more to learn from doing further escape analysis because it already knows what optimizations it can do.

Note that borrowing in Rust never affects when things are dropped; drop order is purely lexical.

The fact that foo accepts a reference of any lifetime, including one that ends as soon as foo returns, should constrain how it behaves with longer-lived references. At least, I thought it should, but apparently it doesn't.

This mechanism is how I thought LocalKey::with worked. You can't leak the reference &'a T because the closure you pass has to be generic over the lifetime 'a. But even though the borrow checker won't let you leak the reference, the compiler must act as if you could have because you can convert the reference to a raw pointer.

The situation isn’t quite as bad as that:

  • In the most common case, the closure/function is passed as a compile-time generic (non-dyn) which means that the outer function will be monomorphized for the actual closure type. This lets the compiler inspect the contents of the closue to determine whether or not any escape happens via raw pointers.
  • Shared references have a strong immutability guarantee, and dereferencing a raw pointer requires an unsafe promise that the programmer will uphold this guarantee. So the compiler is allowed to make codegen decisions under the assumption that any raw-pointer access won’t alter the referent— Any program that violates the guarantee is exercising UB as the result of an unsound unsafe block somewhere.
1 Like

Indeed, Miri considers it UB, even if a *mut is used.

I assumed the use of Cell was intentional to be closer to C++ semantics.

Yes, it's not as interesting with a &i32 or &mut i32.