Hey, I started to learn rust lately. I'm trying to understand how lifetimes work and interact with the ownership system. From my understanding:
Each variable has its own lifetime.
The lifetime is from the definition of the variable untill is last (possible) use or the end of the block (is there a difference?)
Rust performs implicit conversion when possible and needed, this conversion only "shrinks" the lifetime.
My questions are:
What happens in the borrow checker, assuming a call to a function with two variables that returns one of them, from the borrow checker's perspective, are both variables borrowed, and permissions regained only after the last use of the return value?
In the elision rules, Why can't we define a better version that always assigns one generic lifetime for all variables? Then we could always return a value without needing to explicitly write that there is only one lifetime for all inputs/outputs. In cases of an unconditional return value, we could still specify a different lifetime for the returned variable and get the benefits of a larger lifetime. Currently, we need to explicitly specify the lifetime in both cases.
If you wrap something in an Arc or some such "smart pointer" it lives as long as there are clones of that Arc around that reference it. The thing will be dropped (it's lifetime ends) when the last Arc clone containing it is out of scope and dropped.
Yes.
Then:
Not sure what you mean.
If the were only one lifetime for all references passed to a function one of which is returned, then the caller could use that returned reference in places where it is no longer valid.
The current rules are designed under the assumption that fn(&self, âŚ)->BorrowsFromSelf<â_> is the most common pattern, where the arguments ⌠are simply used to direct the method to the appropriate subpart of self and therefore donât need to be valid beyond the end of the method call.
If they instead specified that all arguments shared a common lifetime, then youâd end up needing to needlessly keep the arguments alive until youâre done with the return value if you donât specially annotate them.
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
if x.len() > y.len() {
x
} else {
y
}
}
example. Initially I wondered myself why we have to do the 'a annotation manually, why does the compiler not just try to do the same annotation on its own for the case of two input references and a returning reference.
You're probably learning from The Book. Unfortunately it does a poor job of conveying how Rust lifetimes actually work, IMNSHO.
Rust lifetimes -- those '_ things -- are not directly about the liveness scope of values/variables. They are not about when a value gets destructed. Instead, they are generally about the duration of borrows. The connection between Rust lifetimes and the liveness scope of values is that it is invalid for a value to be borrowed when it is destructed. It is also invalid for a value to be borrowed when it is moved, or when a &mut _ is created to that value.
I recommend making an effort not to conflate Rust lifetimes ('_ things) and value liveness.
Every variable is moved or goes out of scope by the end of the block it is declared in. This scope does not correspond to a Rust lifetime (those '_ things). But it's a conflict for a variable to be borrowed when it is moved or goes out of scope.
Not every variable is associated with a Rust lifetime (a '_ thing). Rust lifetimes parameterize types, not specific values/variables per se.
Borrows are non-lexical, which is to say, a borrow doesn't have to last until the end of some lexical scope.[1] But variables do not go out of scope until the end of their block.[2] If they have a non-trivial destructor, and they are not moved by the end of the block, the destructor will run at the end of the block, and not immediately after the last possible use.[3] Whether a value still needs destructed at the end of scope is not always known at compile time, so sometimes there are runtime checks to determine if the destructor runs or not. In contrast, borrow checking runs at compile time, and Rust lifetimes ('_ things) are discarded after borrow checking completes -- they are not present during runtime.
References (&'_ T, &'_ mut T) don't have destructors. When they go out of scope, they don't cause the borrows of their referent to still be active. That's why borrows are non-lexical. A reference going out of scope generally won't cause a borrow check error on its own unless the reference itself is borrowed -- unless you have a nested & &T or the like.
Much learning material tries to convey the gist of borrowing by comparing Rust lifetimes ('_) to lexical scopes or blocks, but that is not how they actually work.
The borrow duration of references -- the outer '_ thing on a &'_ T or &'_ mut T -- can coerce to a shorter duration. The way this is implemented in Rust is via a supertype coercion. If the lifetime is on something else -- like let's say the inner lifetime of a &mut Vec<&'_ str> -- it is not always possible to shrink the lifetime. Whether a lifetime can shrink, expand, or not change at all is determined by variance. Variance is a general subtyping mechanism (but the only subtyping Rust has is lifetime related).[4]
Borrow checking cannot change the semantics of a program; it is a pass-or-fail check. In particular, borrow checking cannot change when destructors run. So when we talk about borrow lifetimes coercing to shorter borrows, we are not talking about changing the liveness scopes of values -- changing when destructors run. Rust lifetimes are a type level property, not a value level property.
One of the reasons that drop scopes -- where destructors of values run -- are lexical, and not "immediately after the last use", is because borrow checking cannot change program semantics, but the "last use" of values can depend on the borrow analysis. (Another is that where destructors run can be logically important, like for a Mutex lock say, so it is better to be predictable and not subject to change as the borrow checker improves.)
It depends on the annotations on the function signature.
// So long as the return value is still alive, both `*a` and `*b` will
// remain borrowed.
fn ex1<'lt>(a: &'lt str, b: &'lt str) -> &'lt str { ... }
// So long as the return value is still alive, `*a` will remain borrowed,
// but `*b` need not remain borrowed after the function returns.
fn ex2<'a, 'b>(a: &'a str, b: &'b str) -> &'a str { ... }
The Book describes calling ex1 as something like "the lifetime of the return value is decided at the call site based on the inputs", but the way it actually works is the other way around: uses of the returned value keep the borrows associated with the inputs alive.
That is how I recommend thinking about lifetime annotations on the return type -- uses of the return value keep the corresponding input borrow(s) active.
Because elision should correspond to the most ideal signature, and assigning the same lifetime to all inputs when multiple lifetimes are present is not the ideal signature a signficant portion of the time.
It's not a bad thing to have to think about and document the flow of borrows, but it does take some getting use to.
The elision rules we do have are not perfect 100% of the time, but they're pretty good -- more successful than most other elision features Rust has, IMO.
There is actually something like what you suggest in stable Rust that we can compare against. -> impl Trait captures all input generics (including all inputs lifetimes) by default, and it is pretty common for this to cause confusion when it forces some input borrow to stay alive. Implicitly making all input lifetimes the same, or making an elided output lifetime keep all input borrows alive, would have a similar impact. I have helped people overcome the resulting borrow checker errors often enough to be confident that it is a less ideal situation than being explicit about the lifetimes the return type captures. Even though it's more typing, I believe we'd be better off if -> impl use<..> + ... was required instead.
I can understand wishing that just ignoring lifetime annotations (being able to elide them everywhere) and having everything work out was possible, but ultimately it is not; you have to think about your borrow relationships at some point.
For the method case specifically: you generally do not want to force all the lifetimes outside of &[mut] self to be the same either, so independent lifetimes makes more sense. This is especially true when invariance is at play (similar to the last example).
In general, consumers of an interface don't want the borrows they pass in to be any longer than necessary. Forcing multiple lifetimes to be the same would force some borrows to be longer than necessary.
At the end of main, c then b then a go out of scope. But as i32 and &mut i32 do not have destructors, going out of scope only matters if you are borrowed.
Because borrows are non-lexical / end after the last use, nothing is borrowed at the end of main, and that's not a concern. So the lexical block of main isn't really relevant to the example.
What is relevant is whether the exclusive borrow of a which is assigned to b is active when you try to exclusively borrow a a second time. If there are no uses of b after the declartion, the exclusive borrow isn't active at that point. If you uncomment the use of b at the end of main, then the borrow is active at that point.
If b was something with a destructor instead of a &mut i32, then the place where it goes out of scope and called the destructor would be a use of b which keeps the exclusive borrow alive, and that would result in a similar borrow check error[5] even though b isn't explicitly mentioned. This demonstrates that destructors do not run "after the last use", where borrows end; they run based on the drop scope, and can themselves be a use.
This was not always the case, but we've had non-lexical borrows since 2018. âŠď¸
There are also drop scopes that don't correspond to a block, like a temporary in a statement for example. âŠď¸
destructors also run when a value is overwritten. âŠď¸
The analogous situation in some OOP language could be, you can generally coerce a Cat to an Animal, but not in a method that takes a reference to a Vec<Cat> and pushes into it (or else you could push a Dog into the coerced &mut Vec<Animal>). (Ignore this footnote if it made no sense to you.) âŠď¸
There are unsafe, unstable ways to tell the compiler you don't observe borrows in your destructor, so std collections like Vec<_> can be more "magical" than your custom Drop implementing types in this respect. âŠď¸
I suspect it's only popular in forum threads. Try to find an example like that âin the wildâ. They do exist, I'm sure, but most of the time, when you accept two references you return one of them (or something related to one of them), unconditionally.
That's obvious, I would think: having such rule is only beneficial in a world âselector functionsâ are used often. If they are used rarely then omitted lifetime for a function with two references is, most likely, an error to fix, not something to embrace.
Thanks for your detailed write-up -- I think I can understand parts, and I would agree that we should highlight some points more in our books -- as long as it makes the introduction to borrowing and lifetimes not more difficult.
I always wondered about
Why was this implemented in this way? For memory consumption, it would make sense to free large data immediately after the last use?
And a related question: Does Rust use for each function only a single stack frame allocation? E.g in the example below, we have three stack allocated variables a, b, and c. Is the stack storage in Rust allocated for all three variables at the beginning of the functions code, or might the stack pointer changes during execution of the function. And is this behaviour an implementation detail only, or are there serious reasons?
fn test() {
let a = 0;
{
let b = 0;
}
if a == 0 {
let c = 1;
println!("{} {}", a, c);
}
}
fn main() {
test();
}
This point is actually related to questions of some users about the location, where variables are defined in functions. Some people favor to define all used local variables at the beginning of the function, while others define variables just when they are needed.
From what I understand, itâs mostly about making the control flow more obvious for readers. Drop implementations arenât necessarily just about freeing allocationsâ they also get used as guards for critical sections (e.g. Mutex) and various other creative uses. You might also have raw pointers that canât be tracked by the compiler but rely on a destructor not running early.
where having the lifetime of the returned string also restricted to that of the needle (rather than just to the haystack) would be bad.
There's always a trade-off between just making it compile but possibly-wrong and asking that you write intentionally which you meant. The nice thing about not compiling is that it's an opportunity to give an error message saying "hey, which of these two do you want?".
Especially since it's common that the longest case doesn't actually need to be written with lifetimes at all, in the same way that min in std::cmp - Rust doesn't have any lifetimes.
You need some form of borrow checking to know where the last non-destructor use is in the general case, and now your borrow checker changes the semantics of your programs. Improve the borrow checker or restrict it more to fix a bug, and drop locations change. Fall out would likely include deadlocks and UB from code bases that rely on knowing where the destructor would run.[1] Even ignoring borrowing,[2] drop locations could unexpectedly change due to code changes that impact control flow, like making a match arm unreachable!() instead of empty or such.
This part is sort of orthogonal, but deallocation takes gas too, so sometimes you don't want it to happen immediately. Though I guess add "performance surprisingly changed" to the list of things unstable destructor semantics would cause.
I've always considered it an implementation detail. I think allocas happen sometimes but that's just an impression from skimming PRs. Optimizations can definitely put variables in registers or elide them completely, etc. I don't know the state of reusing stack slots for dead locals, or where that optimization would take place in the compilation process.[4]
As @2e71828 points out, it's not just about deallocation; this is another area where the Book's heavy "stack vs heap" model falls short. âŠď¸
e.g. if the liveness analysis for drop location ignored lifetime relationships âŠď¸
n.b. drop flags are now out-of-band and only present in functions that need them; zeroing on drop is no longer done either âŠď¸
e.g. if there is a Rust specific MIR optimization to do that or if it's just left to LLVM, which surely does something of the sort âŠď¸
(..) the entire frame worth of local variables are allocated, on frame-entry, in an uninitialized state.
So the variables local to the function (a, b, c) are allocated at function start but not necessarily initialised, as shown in the example in the link provided.
Even if modify your example with let c; at the beginning of test, c is still allocated at first, and it is only initialised once it's reachable through all control-flow paths. can only be used once it's initialised through all reachable control-flow paths.
Variables don't have lifetimes. Loans (temporary references) do.
You can borrow the same variable multiple times for different lifetimes (even at the same time if they're shared loans).
Rust makes it confusing because Rust's lifetime ('a) of a reference is a very specific thing, and it's not the same as lifetime/lifecycle of a value used in computer science in general.
Values can live (in the general sense) and never have any lifetime at all (in Rust sense).