Which side is handling lifetime specifier in rust compiler?

Hi, I want to stop the incomprehensibility about lifetime specifier. Everybody says that "lifetime specifier is for compiler, compiler needs to optimize reference bla bla bla" but nobody talk about "how compiler optimize, what happens if lifetime specifier doesn't exist" etc. I want to clarify this blurry situation. I want to find the exact line of parser the lifetime specifier in rust compiler source code. Is there someone who knows the exact line of parser the lifetime specifier?

Edit: Please don't forget that, I already know how to use lifetime specifier. I just want to dig into the source code of rust's compiler and understand that how to handle algorithm of lifetime.

Edit 2: Please don't try to give answer of these questions: what is lifetime specifier, how to use lifetime specifier in rust etc.. We already did this chat in here: I don't able to understand lifetime specifiers :(

The compiler doesn't do any optimization based on lifetime specifiers. They're purely there to restrict which programs are allowed to compile, which is how Rust can claim memory safety without garbage collection or reference counting.

You know as much about the code as the parser, so I don't know how much help it will be but lifetimes are parsed here: Parser::expect_lifetime

If you want to understand why lifetimes exist, a good way is to learn C++, since they're partially a reaction to languages that don't have them.

You can find instructional material about lifetimes in the Book here, here, and scattered throughout.

3 Likes

Indeed. However it is my understanding that lifetimes can allow some optimisations to happen that would otherwise not be allowed. Thanks to the compiler knowing that multiple references as parameters to a function cannot point to the same thing.

The noalias optimization is from knowing &mut is unique. You can't make two &mut point to the same thing no matter what the lifetimes are, and different lifetimes on & doesn't mean they point to different things.

The only way they interact with performance is by changing the code that humans write: when in need of a safe, guaranteed-valid pointer, they can use a cheap reference instead of smart pointers.

8 Likes

Ah yes. Makes sense.

Lifetimes are one of the mechanisms[1] that allow the compiler to prove safe code is free of various types of errors, such as use-after-free and data-races. The desire for such a compiler is definitely human motivated (history has shown we do a poor job of avoid such bugs manually).

The main thing that's disallowed is the combination of aliasing and mutability, which is why a &mut _ is an exclusive reference for example. The lack of aliasing is used by compilers to do things like perform non-overlapping memcpys instead of checking for overlap, to reuse values (e.g. in registers) when it knows their values could not have changed, etc.

These types of optimizations don't happen based on lifetimes, but the lifetimes are part of the analysis that proves they're correct.

Lifetime annotations also do other things such as enforce a particular API. That enables, for example, the ability to change a function body without breaking downstream code, another human-motivated property.

Changing specifies changes the semantics of the code, for example it changes function API contracts and how you've told the compiler it should borrow check your code. If the code itself still compiles, changing the annotations may break downstream code, or may allow different code to compile.

But if you have a complete program which compiles with rustc, and then give it (unaltered) to a compiler that ignores lifetime specifiers, the compiled results should be semantically the same (provided the other compiler was otherwise implemented correctly).

If lifetime specifies didn't exist at all (e.g. if you always used a compiler that ignores lifetimes), you wouldn't have the memory safety guarantees that borrow checking provides anymore. I.e. you'd compile programs a lot with undefined behavior.[2]


  1. along with other things like the restrictions on reference types, the Send and Sync traits, etc ↩︎

  2. Arguably more than you would with other languages, even, because Rust is designed around being able to exploit proven-correct code more aggressively, e.g. by relying on the lack of aliasing whenever there's a &mut _. ↩︎

6 Likes

Can you provide more detailed information about this?

Edit: There is borrow check feature in rust, because of that all references are valid already. So why we're setting lifetimes? Already all references valid.

Lifetime annotations guide the borrow checker. Consider the following function (which does not compile, because the borrow checker cannot work out what I mean):

use std::str::FromStr;

fn choose_a_string(which_one: &str, str1: &str, str2: &str) -> &str {
    let choice = bool::from_str(which_one).unwrap_or_else(|_| {
        let num = i32::from_str(which_one).unwrap_or(0);
        num == 0
    });
    if choice {
        str1
    } else {
        str2
    }
}

Without lifetime annotations, it's unclear to the borrow checker what constraints apply to the lifetimes of the references. By adding lifetime annotations, I can make it clear:

fn choose_a_string<'retval>(
    which_one: &'_ str,
    str1: &'retval str,
    str2: &'retval str,
) -> &'retval str {

(Rust Playground with lifetime annotations).

With the added annotations, the borrow checker can see that which_one's lifetime is unrelated to the lifetime of the return value, and that both str1 and str2 need to be valid while the return value is alive.

8 Likes

The borrow checker relies on lifetimes and their constraints. The compiler can't guarantee the references are valid without lifetimes.

Some examples.

4 Likes

That's the exact same thing that restrict does in C. Only Rust does that consistently and thus immediately exposed plenty of bug that also existed in C (just nobody used them and thus nobody cared about them) initially. This tells you everything you need to know about possibility and feasibility of that feature in C/C++.

@farnz and @quinedot :clap: :clap: :clap: :clap: :clap: :clap: :clap: :pray: :pray: :pray: :pray: :pray: :pray: :pray: :pray: :pray: :pray:

Bros, I really respect to both of you. You f*cked the problem that has been plaguing my mind like a splinter for one year. Ownership, moving, borrowing and reference. Probably now I understood all of these and understood relations between these better now.

@khimru Also thanks to you too bro. I haven't any information about restrict feature in C. Actually I have bits of information about C and I see this feature now. Thanks.

Technically compiler may see the whole program and thus deduce everything.

But that would be really miserable experience: you change return a to return b in some low-level crate and, suddenly, get million errors and need to fix thousands lines of code in one go! In hundreds of crates, many of which you haven't even wrote!

Who would be able to use such a language and why?

That's literally direct opposite from normal “divide and conquer” approach that people are usually applying to the programming.

C++ templates have that issue, and people don't like that. That's why C++20 introduced constraints and concepts that are playing the same role for types as lifetimes play for references.

2 Likes

If you know just a tiny but of C then it's easy to understand why one would need lifetimes to turn it into a safe language.

Consider the following standard C function:

char* strstr(const char*, const char*);

As you can see it receives two pointers and it's entirely symmetric (at least as far as compiler is concerned).

But consider this function:

void foo() {
    const char* a = strdup("fox");
    const char* b = "fox";
    char* c = strstr(a, b);
    *c = 'p';
    printf("%s\n", a);
    free(c);
}

It's entirely correct and 100% works. But if we swap a and b in the call to strstr? It crashes now!

And the next question is: how would we prevent that?

If you'll start thinking just about that silly example you'll find out that you need to entirely rewamp the language design, turn it inside out and redo the standard library, too!

Because here I was removing c but I couldn't access a after that! Why? How would compiler know if that's allowed or not?

And I passed const char* in, then got char* out… which is identical to that const char* — and yet it's Ok. How can compiler ever untangle that mess? Not even human may do that!

Lifetimes exist even in C… only they exist in the documentation. To ensure that all that information is available to the compiler they have to be represented in the code, too.

Actual rules of the lifetimes and borrowing are complicated, sure, but conceptually? That's very simple thing.

5 Likes

That's an interesting point and I think very nice explanation.

As a historical matter, note that when the borrow checker was first implemented, you had to use lifetime annotations everywhere; lifetime elision was not available.

So, where in modern Rust, you can write something like fn find_thing(&mut self, name: &str) -> &Thing, in early borrow checker Rust, you'd have to write fn find_thing<'me, 'name>(&'me mut self, &'name name) -> &'me Thing. Similarly, you couldn't write fn string_len(string: &str) -> usize, but had to write something like fn string_len<'s>(string: &'s str) -> usize.

The lifetime elision rules came about because the people working on the language discovered that there's a small number of rules (just 2 for free functions, 3 for methods) that allow the majority of code to be written without explicit lifetime annotations, without accepting code that's ambiguous in the absence of annotations (like my choose_a_string example above).

Notably, though, because the lifetime elision rules specify a completely mechanical transform from unannotated code to code where every lifetime has an explicit annotation, the borrow checker rules haven't changed with the addition of lifetime elision; the borrow checker works as-if you wrote explicit annotations for everything, it's just that lifetime elision allows it to infer lifetimes.

So, taking my annotated (but using elision) version of choose_a_string, I wrote the signature:

fn choose_a_string<'retval>(
    which_one: &'_ str,
    str1: &'retval str,
    str2: &'retval str,
) -> &'retval str {

But the borrow checker looks at it as-if I wrote:

fn choose_a_string<'which, 'retval>(
    which_one: &'which str,
    str1: &'retval str,
    str2: &'retval str,
) -> &'retval str {

and has every single lifetime fully annotated at this point.

One thing I recommend to people struggling with lifetimes is spending a bit of time taking Rust that has elided lifetimes, and carefully adding lifetime annotations matching the elision rules until you have no elided lifetimes. Then you can see what the borrow checker sees when it looks at your code - it's not the same as what you wrote because the lifetimes are all annotated.

5 Likes

This is additional and so valuable information, thanks a lot.

Thanks for everybody. If there is someone for adding new information you can do it bro. Thanks.

Bro, what's the difference between a and b variables?

    const char* a = strdup("fox");
    const char* b = "fox";

I'm not sure whether you are joking or reaching enlightenment.

There are no difference between a and b, just like there are “no difference” between x and y:

    const char* x = "fox";
    size_t y = 42;

On the assembler level these two functions produce indentical code, after all:

int foo(size_t x, size_t y) {
    return x + y;
}

char* bar(char x[], size_t index) {
    return &x[index];
}

Just like C was created from B by splitting things that are identical, on the machine code level and now C have integers and pointers (even if on the machine code level they are one and the same) Rust was created from C by splitting things further. Your a, in Rust, would become Box<str> [1], while b would become &'static str [2].

And types with lifetimes are similar: they are all different on the language level (and compiler uses that difference to guarantee your program memory safety) but after compilation they all turn into the same machine code (just like size_t and const char* become one and the same on most architectures).


  1. Except Rust doesn't believe in null-terminated strings so that would we “fat pointer”, but for types like i32 or f64 it wouldn't be any different from pointer in C. ↩︎

  2. Again: for str reference is “fat pointer” with str size and pointer to actual memory, while &i32 or &f64 is a normal pointer inside. ↩︎

Of course I'm trying to learn, I said before that I don't have strong experience about C. I want to understand that why strstr(b, a); returns NULL although a and b are same.