Why doesn't the compiler use max lifetime automatically?

Given code like

fn main() {
    let string1 = String::from("abcd");
    let string2 = "xyz";

    let result = longest(string1.as_str(), string2);
    println!("The longest string is {}", result);
}

fn longest(x: &str, y: &str) -> &str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

(The same example at playground)

This obviously fails as explained in the documentation but I have to wonder, is there some reason the compiler doesn't simply use the obvious interpretation that lifetime is minimum lifetime of all the inputs and outputs?

I'm new to Rust but it would seem obvious to me that given a signature like

fn foo(a: &t1, b: &t2, c: &t3, ...) -> &t4

it would always be safe to automatically cover everything with a single lifetime:

fn foo<'a>(a: &'a t1, b: &'a t2, c: &'a t3, ...) -> &'a t4

The compiler already suggests this in the error message. Is there some simple reason I'm failing to understand which prevents rustc from defaulting to this and the programmer would need to declare only non-default (non-minimal) lifetimes manually if the default interpretation would require lifetimes that wouldn't be practical for the given function?

Simply because the minimum is not always what you want. Sometimes you want the maximum. Sometimes you want something in-between. The point being, you want a output lifetime tied to the lifetime of some input. Any assumptions of which lifetime is probably going to be wrong.
This question reminds me of a comment I had once seen on Reddit: "If the compiler knows that I'm missing a semi-colon, why doesn't it insert it and move on?". The answer is the same - it's not something that can be guessed and hence it doesn't do that.

5 Likes

When it comes to the function signature the compiler never guess. It's contract point between the function body and the function callers. Since everything is fully specified at this point compile error never propagate across it.

For lifetimes the compiler follows few simple rules mechanically.

https://doc.rust-lang.org/reference/lifetime-elision.html

7 Likes

As a concrete example, consider these three trait methods, which differ only in their output lifetime:

trait Trait<'a> {
    fn a<'b,'c>(&'b self, x: &'c str)->&'a str;
    fn b<'b,'c>(&'b self, x: &'c str)->&'b str;
    fn c<'b,'c>(&'b self, x: &'c str)->&'c str;
}

They're all perfectly reasonable in different situations:

  • a returns some reference to external data, which can live even after both self and x are gone, which means that it can be returned from a function where those are local variables.
  • In b, x is not used beyond the end of the call, but the return value is dependent on self remaining alive and unchanged. This is the sort of signature you'd see for a lookup method in a containter.
  • c returns a substring of the argument, but doesn't need self to stick around. Here, self is likely some sort of string searcher, like a Regex engine.
11 Likes

I understand that there's no one-size-fits-all option available for all cases. However, currently the compiler has no default at all and it practically always requires explicit lifetime declarations. I'm trying to figure out why there's no default even though the compiler is perfectly capable of suggesting suitable value that could be used by default?

If the answer is that the compiler suggested solution is basically a guess and the implementation is not stable enough to be used as long term default, that's fine, too. Implicit defaults are next to impossible to change in the future so specifying implicit default is obviously a very hard task.

Note that there's already a rule to choose the lifetime of the &self or &mut self in methods. I. e. fn foo(&self, x: &i32) -> &str works and is desugared to mean something like fn foo<'a, 'b>(&'a self, x: &'b i32) -> &'a str. This rule seems somewhat incompatible with the rule to “make all lifetimes the same” that you're suggesting. Additionally, consider that if there is no (elided) lifetime in the return value, then all the elided input lifetimes are different / independent. Even if we used your proposed new rule only for cases that don't compile at all right now, this would still lead to IMO quite surprising differences between very similar looking kinds of code.

Especially as long as lifetime arguments can still be fully elided: e. g. the signature fn foo(x: &Foo, y: &Bar) -> Baz would require x and y to have the same lifetime if the type Baz has a lifetime parameter (e.g. some struct Baz<'a>), but the two lifetimes of x and y would be independent (each getting their own fresh generic lifetime) if Baz doesn't have a lifetime parameter.

4 Likes

This isn't the case when you have invariant lifetimes.

TL;DR: fn f<'a, 'b>(_: T<'a>, U<'b>) and fn f<'a>(_: T<'a>, _: U<'a> can often be used identically, when the lifetimes are covariant (i.e. the caller can freely manufacture a shorter lifetime to unify lifetimes), but when the lifetimes are invariant (i.e. the caller cannot do that), these signatures are meaningfully different.

That said, there is a fully general way of spelling this that doesn't run into variance issues: more lifetimes:

fn longest<'x: 'out, 'y: 'out, 'out>(x: &'x str, y: &'y str) -> &'out str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

That is, given two input lifetimes 'x and 'y, return a lifetime 'out which does not outlive 'x nor 'y. ('a: 'b can be thought of roughly as 'a outlives 'b or 'a ≥ 'b.)

As for why this isn't an inference rule? The existing single-lt-to-single-lt and self-to-lt elision rules were added when approximately 90% of lifetime annotations were for exactly that pattern. In the single lifetime case, it's the only (real) option; for the self case, it's the clear majority option.

Unfortunately, the "all input lifetimes" rule would clash with the self rule, as methods are just normal functions with an existing lifetime elision rule.

But this "all input lifetimes" rule is now used in one place: async functions. So it might be worth reconsidering if adding this is reasonable.

The way to advocate for it is to show what percentage of functions where this elision would be applicable (have multiple input lifetimes, output lifetime(s), and no self parameter) would be correctly elided by this rule. My guesstimate is that the percentage would need to be decent account over 50% to be considered, and wouldn't really get close.[1]

Ultimately, the function signature is a communication tool; not just from the developer to the compiler, but also to the future reader. For that purpose, it's useful to be clear about intent. More elision rules are more convenient when authoring, but also mean more rules that have to be understood to identify what a signature means.


  1. Intuition suggests inheriting one of the input lifetimes is more common. Perhaps when those can use 'arg (I think the feature name is in-band lifetimes?), a greater percentage of currently-requires-explicitly-introduced-lifetimes would be served by this elision, and it'd be more justifiable? I'm unsure. ↩︎

8 Likes

There is a default (i.e. lifetime elision), but only when there's no ambiguity about which lifetime should be used.

When such ambiguity exists there's no default because any default at all is likely to lead to hard to debug errors for some fraction of the community, depending on the situation. No default is better than a wrong default.

That's the thing though, it can't. To take a page from the mechanism/policy separation: in terms of mechanism it can be done yes, but on the policy level it cannot i.e. the decision regarding what should be the default cannot be made by the compiler, and its developers can't teach it that either.

4 Likes

I'm marking this as accepted because it explains the whole picture the best in my opinion. It's obvious that my idea wouldn't work with self and it wouldn't be nice to have one magical implicit rule for self and another for other stuff.

I really liked the example with the lifetime 'out, too.

1 Like