I wrote a post explaining lifetime parameters to help me understand the idea. In particular it wasn’t clear to me from reading “the book” what problem they were solving, so this helped me work through that. I also added information on lifetimes as bounds, which appears to be missing from the book. Maybe this will be helpful to others out there with similar confusions:
As I learn more about lifetime annotations (e.g. 'a), I realize the programmer could make a mistake in assigning these and there is no way for Rust to verify. Thus lifetimes annotations are inherently unsafe (uncheckable) code. The program could crash accessing memory that has already been freed, which can’t happen with GC. Am I missing something?
Indeed. Memory leaks are a different class of error from use-after-free errors, and typically much less dangerous. A use-after-free error can both violate the functional correctness of a program and expose it to attacks like remote code execution. Memory leaks will result only in performance issues.
A language that could also prevent memory leaks while still being practical would be great. It’s a harder problem, though, partially because identifying whether something is a “leak” is more subjective than identifying use-after-free.
If the checker could verify I wasn’t mismatching my annotations, then it wouldn’t need me to make them at all.
The annotations are used not just for checking the function body, but for checking other code that calls that function. The fact that they are not inferred is a design choice, for the same reasons other type information is not inferred across function boundaries. (This helps enable separate compilation, and prevents unintentional changes to function signatures.)
If I have an opaque algorithm in the body of my function, the checker can’t know which input was copied to which output.
That’s precisely what the borrow checker checks. Any function called within the body also has any relevant lifetimes as part of its signature, so it’s not opaque to Rust’s type system. The lifetime of a reference is part of its type, and a lifetime error is a type error.
Thus lifetimes annotations are inherently unsafe (uncheckable) code. The program could crash accessing memory that has already been freed, which can’t happen with GC.
If you can compile a program with a use-after-free bug in safe Rust, this is a serious bug. It should not be possible.
Yeah I thought of that, but I was thinking of for example some complex loop over arrays mutably referencing arrays which mutably reference back to those first mentioned arrays, e.g. some memory-hard proof-of-work hash function. I guess what I should have realized is if I can’t reasonably write that in safe code, then I will need to shift to unsafe code.
I understand that global inference places some restrictions on what sort of type system you can have, so that might be another hindrance.
Normally I argue that type annotation on function signatures, is better for readability, but in the case of 'a, I am of the opinion it hurts readability. The programmer doesn’t want to reason about the lifetimes, he just wants them to be checked. He will only reason about them if there is a checking error, he needs to fix.
IDEA: Crates has this nifty feature where you don’t need to tell it explicitly the Github changeset because Crates saves it is in a configuration file automatically. I am thinking perhaps Rust’s compiler could do something analogous for lifetimes and infer the lifetime annotations on function signatures. They wouldn’t be global, because we compile incrementally as we edit so the prior inferred annotations would be saved already by the compiler (which fixes the problem of separate compilation, etc). On error, the compiler/IDE could display the inferred annotations. Perhaps this would be much less noisy and might make Rust appear to be much more automatic and friendly. The point is I am trying to find ways to make the lifetimes feature get out of the way most of the time so we can focus on our code, as we always have. So it would have the light feel of GC, but the benefits of needing less memory to attain the same performance and/or less loss of performance to attain the same low pauses latency.
I have found I can write a lot of code without using lifetime annotations at all, the compiler infers the right lifetime automatically. Normally when I have to write lifetime annotations is is either because I have made a mistake, or because I am trying to do something genuinely complicated (or at least dangerous, like returning a reference to something that may go out of scope). The only places I have needed them in my translation of EoP so far is when I am returning a closure from a function that captures a closure from the outer function arguments. I think if there is support for returning unboxed closures even this might not be needed.
If it needs a lifetime annotation, maybe it’s time to see if there is a simpler way to do it?
Strictly speaking, at a function boundary, the compiler is not inferring the right lifetime – it’s “defaulting”, or maybe you might just call it “guessing”. Basically, within a fn signature, if you omit a lifetime, the compiler uses a series of relatively simple elision rules to fill it in. These rules make local decisions that don’t consider the fn body at all:
in fn arguments, missing lifetimes are replaced with an anonymous, fresh parameter, so that fn foo(&u32) is equivalent to fn foo<'a>(&'a u32)
in the return type, missing lifetimes are filled in by looking at the arguments, which makes the assumption that the returned reference will be a reference into one of the arguments
if there is just one lifetime found in the argument types, use that
otherwise, if this is an &self or &mut self method, use the lifetime of the self reference
(one exception to the above rules is for trait objects, where the lifetime is chosen based on the innermost enclosing pointer. So Box<Trait> becomes Box<Trait+'static>, but &'a Trait becomes &'a (Trait+'a), essentially.)
In contrast, within the fn body, we will use a much more complex inference procedure that considers the entire fn body.
The reason we draw this distinction is primarily, as @mbrubeck indicated, separate compilation. It’s also because inferring across fn boundaries is actually very, very hard to even do – there are fundamental limitations of how much you can do in that space. Moreover, it makes error reporting even harder, since without any lifetimes written down, it’s really hard to report conflicts (though we have to work more on that front in any case).
I don’ think that changes the substance of my post, which is I find if I am having to add lifetime annotations by hand it is because I am doing something ‘odd’, and I would probably be better implementing it another way? It would be good if the ‘guessing’ rules correspond to best practice.
I didn’t mean to disagree with your post, just to point out a subtle distinction that is often overlooked (which can lead to confusion) between elision and inference. I agree that if you find you are using named lifetimes, it does indicate that you may be doing something a bit more complex, though I wouldn’t go so far as to call it odd or a code smell. For example, a signature like this:
fn foo<'b>(&self, foo: &'b Bar) -> &'b Baz
doesn’t set off any alarm bells for me, it just indicates that the return value is going to be some value borrowed from foo.
That is indeed an interesting thought. We’ve also been thinking that rustw might provide a great, user-friendly way for the compiler to give feedback about the results of its inference.
In any case, the fact that an incorrect fn signature can lead to a lot of confusing errors is very much on my mind lately. In the work on new-style error messages, we’ve been looking at lots of ways to improve this. Some things are as simple as just reformatting how we display the same information, but in a lot of cases we can find better information to display – or else reorder the messages, to give priority to messages that are more likely to be the “root cause” of the problem. Still working on the details here but I think there’s lots of room for improvement even without making any major changes.
If you use Rcs or the like, you don’t have to mess with these annotations at all. If you are passing around borrowed values/interior references, you should indicate where the references are coming from - I don’t like playing the “will this invalidate that” game at all.
Good practice in C in these cases is to to write the “lifetime annotations” in a comment, so having to write them in the code is actually easier.