About borrowing and lifetimes

Hello, I am a rust newbie here. This has probably been answered multiple times but I am struggling to understand most of the answers on the internet and "The Rust Programming Book". I am currently working with polars and am getting an error along the lines of this

        |
    96  |         let dataframe = CoolStruct::helper().with_columns(
        |             --------- binding `dataframe` declared here
    ...
    100 |         let groups = dataframe.groupby(["symbol"]).unwrap();
        |                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        |                      |
        |                      borrowed value does not live long enough
        |                      argument requires that `dataframe` is borrowed for `'static`
    ...
    108 |     }
        |     - `dataframe` dropped here while still borrowed

So what's happening here is that I have a helper function that returns a dataframe. Instead of returning the dataframe itself though, it appears that rust is returning a reference to the dataframe or letting our main function "borrow" the dataframe. The issue is that when the function ends, the borrowed value gets wiped off of the memory. What exactly is the correct way to "transfer" the ownership of the object. The solution it proposes is to give the borrow a lifetime of 'static, but I am not sure exactly how to do that. Also, about lifetimes, why do certain objects require lifetimes and others don't? My struct CoolStruct looks like this

pub(crate) struct CoolStruct<'a> {
    columns: Vec<String>,
    symbols: Vec<String>,
    frame: DataFrame,
    symbol_groups: GroupBy<'a>
}

Why does GroupBy specifically need a lifetime specifier while the other objects do not? If it was heap-allocated memory wouldn't it be easier to let the programmer deallocate the memory when appropriate? If I use the 'static lifetime, am I leaking memory by keeping the object in memory until the program finishes executing?

Here is a small sample of the code that caused the error above:

pub(crate) struct CoolStruct<'a> {
    columns: Vec<String>,
    symbols: Vec<String>,
    frame: DataFrame,
    symbol_groups: GroupBy<'a>
}

impl CoolStruct<'_> {
    fn helper(symbols: &Vec<String>) -> DataFrame {
        let mut df = DataFrame::default();

        for symbol in symbols {
            let json_tree = serde_json::to_string("{some json}").unwrap();
            let cursor = Cursor::new(json_tree);

            let mut tmp_df = JsonReader::new(cursor).finish().unwrap().clone().lazy().with_columns(
                [
                    polars::prelude::lit(ticker.as_str()).alias("symbol")
                ]
            ).collect().unwrap();

            tmp_df.set_column_names(vec!["symbol", "a", "b", "c"].as_slice()).expect("Collumn number mismatch");
            df = df.vstack(&tmp_df).unwrap();
        }

        return df;
    }

    pub(crate) fn new(mut symbols: Option<Vec<String>>) -> CoolStruct<'static> {
        if symbols.is_none() {
            symbols = Some(vec!["aaa"].iter().map(|s| String::from(*s)).collect());
        }

        assert!(symbols.is_some());

        let columns_list: Vec<String> = vec!["symbol", "a", "b", "c"].iter().map(|s| String::from(*s)).collect();
        let symbols_list = symbols.unwrap();

        let dataframe = CoolStruct::helper(&symbols_list);

        let groups = dataframe.groupby(["symbol"]).unwrap();

        CoolStruct {
            columns: columns_list,
            symbols: symbols_list,
            symbol_groups: groups,
            frame: dataframe
        }
    }
}

Here's the immediate cause of your problem: your constructor CoolStruct::new() returns type CoolStruct<'static>. If you substitute that into the lifetime parameter, then you can immediately see that symbol_groups has type GroupBy<'static>. The signature of DataFrame::groupby() is:

pub fn groupby<I, S>(&self, by: I) -> Result<GroupBy<'_>, PolarsError>

which, due to lifetime elision rules, means that the lifetime parameter of the returned GroupBy is the lifetime parameter of &self.

Your constructor wanting to return CoolStruct<'static> is probably wrong. It should likely be returning Self, which is CoolStruct<'_>, i.e., the lifetime parameter of the returned type is exactly the same as the (currently elided) lifetime of the Self type of the impl block. Or you can also be explicit, and impl<'a> CoolStruct<'a> and return CoolStruct<'a> from the constructor.

Lifetimes don't do anything. They don't keep alive values. They don't influence when things are dropped. They only constrain how long borrows are considered valid. (But again, you shouldn't be using 'static here. 99% of the time, when the compiler suggests adding 'static all over the place, it's wrong.)

Types need a lifetime parameter when they potentially borrow from other values. The groupby implementation is likely lazy, so it needs to keep a reference to the data frame it's operating over.

2 Likes

Thank you for your help, how do I explicitly set the lifetime of a borrow, say the borrow that occurs when my helper function returns a value? Also, I am still getting an error along the lines of this

    |
100 |           let groups = dataframe.groupby(["symbol"]).unwrap();
    |                        ----------------------------- `dataframe` is borrowed here
101 |
102 | /         CoolStruct {
103 | |             columns: columns_list,
104 | |             symbol: symbols_list,
105 | |             symbol_groups: groups,
106 | |             frame: dataframe
107 | |         }
    | |_________^ returns a value referencing data owned by the current function

Lifetimes are used to track borrows and the validity of types. Practically always a lifetime parameter for a type means that there's a reference within, somewhere. Said reference can't outlive the value it points to, for example; ensuring that doesn't happen at compile time is what the lifetime is there for.

GroupBy contains a reference to a DataFrame, for example.

"Lifetime" is an arguably poor choice of terminology for what we have in Rust, but here we are. In Rust, lifetimes like 'a aren't describing liveness of values (from creation to destruction). They're primarily describing borrows, the use of said borrows, and relationships between borrows.

A 'static lifetime on a reference means that the value pointed to will last the entire runtime -- maybe it's in static memory (part of the executable) or maybe it has been leaked.[1]

But the end of some lifetime doesn't mean something was deallocated, because Rust lifetimes aren't describing the liveness scope of values.[2]

Rust is RAII based, so most allocation and deallocation happens behind the scenes. Instead, values are dropped -- occasionally explicitly, but most often just by going out of lexical scope at the end of a function or other block. Rust does not have garbage collection at the language level.


Here's groupby. If we desugar the lifetimes, we get

pub fn groupby<'s, I, S>(&'s self, by: I) 
    -> Result<GroupBy<'s>, PolarsError>
where
    I: IntoIterator<Item = S>,
    S: AsRef<str>,

The same lifetime for &'s self and GroupBy<'s> means that the latter is in some way a reborrow of the former.

So here:

        let dataframe = CoolStruct::helper(&symbols_list);
        let groups = dataframe.groupby(["symbol"]).unwrap();

groups is borrowing dataframe. If you try to move (or drop) dataframe, that's going to conflict with the borrow.

When you attempt to return this:

        CoolStruct {
            columns: columns_list,
            symbols: symbols_list,
            symbol_groups: groups,
            frame: dataframe
        }

You're trying to return a data structure that's holding on to a borrow of something also contained in itself.[3] We say it's a self-referential struct.[4] There are no generally useful ways to construct self-referential structs in safe Rust, and almost all unsafe attempts are unsound or just plain UB.[5]

You can't return just groups either, because dataframe is dropped at the end of the function.

You can try mirroring DataFrame's approach: Have an owning type that holds the data (with no lifetime) and a borrowing type which refers to it.


  1. 'static bounds like T: 'static mean something different -- that's describing a quality of types and does not mean that the values of such types last "forever". ↩︎

  2. Naturally a reference to deallocated memory can't be allowed, so a lifetime may be limited by a deallocation somewhere, but you can think of deallocating a value as "just" a use of that value that conflicts with having outstanding references, like moving the value would be. ↩︎

  3. And you're moving dataframe in the process. ↩︎

  4. You can search this forum for many examples. ↩︎

  5. Rust doesn't have copy/move constructors either. ↩︎

1 Like

I'm not sure what you are asking. I've shown you how you can change your impl's and function's signature to use an explicit lifetime above. If you want to do something different, please clarify.

You are constructing the data frame inside your function. You can't therefore return anything that references it, because its place will be invalidated:

  1. either it's dropped, and it's completely destroyed;
  2. or it's returned, which means that it's moved, so its address is invalidated anyway.

Now that I look at it better, you are trying to achieve scenario #2 above, and construct a self-referential type (the GroupBy references the DataFrame, which is in the same struct as the GroupBy itself). That's not possible in safe Rust, because once again, moving the struct that contains both would invalidate the address of its frame field, leading to a dangling reference.

You should just keep your DataFrame and GroupBy apart.

Lifetimes in function bodies are inferred from the code. It's the output of a compile-time analysis, not something you set. Even if they were settable, there is no lifetime that lets you take a reference to dataframe, move dataframe, and then use the reference. Once you move dataframe, the reference dangles and using it would be UB.

[1]


  1. You can influence the analysis by annotating references and the like with named lifetimes, but it's rare to need to do so in practice. The only named lifetime available in your code is 'static, and you can't borrow local variables for &'static. So it still isn't helpful for you. ↩︎

1 Like

Ahh I see the issue, thanks for the help

What I meant was how could I explicitly set the lifetime of the borrow that occurs when the dataframe is returned from helper(), it doesn't appear that I can set it in the return value of the function header of helper()

62  |     fn helper() -> DataFrame<'frame> {
    |                                               ^^^^^^^^^-------- help: remove these generics
    |                                               |
    |                                               expected 0 lifetime arguments

DataFrame doesn't have any lifetime annotations (it's an owning type), and so there's no lifetime to specify in helper().

The time during which the dataframe variable is alive is not influenced by where you obtained its contained value from. You can only "set" the validity of a variable to something longer by moving its declaration to a bigger scope.

It's perfect terminology for what they are doing. The problem newbies are having with lifetimes comes from from how they are called from critical, absolutely fatal misunderstanding of the whole process.

Newbies (especially ones with GC-languages) often imagine that lifetime markups are instructions for the compiler. That's not true at all! Except for some corner cases in HRBT-related functions definitions they are ignored by the compiler when it generates code and mrustc is an existintional proof of the compiler which ignores lifetime markups yet still may compiler Rust code just fine.

Why do they exist, then? The answer is here:

In 50 years of C usage it was found out that while humans can write programs which allocate and deallocate memory correctly when program goes beyond threshold they invariably have some warts and omissions and then program break.

One solution (which IMNSHO caused more harm than good, but that's different story) is garbage collector: let's allow human not to think about lifetimes at all and make computer track them!

It kinda-sorta works, but Rust uses different approach: it makes compiler verify that lifetimes in your program are all make sense. That you don't try to use object which is destroyed. That you wouldn't, by accident, modify the same object from two different places and so on.

Lifetime markups tell the compiler “story of life” of data in your program. They are only needed because compiler couldn't understand that your program is valid without them.

This makes questions like these completely useless and pointless:

What may that phrase even mean? Compiler knows how to deal with lifetimes and borrows inside of your function. It can see everything there and it's very good at local reasoning.

You don't need to assign local lifetimes, you just need to write your program correctly!

1 Like

Oh, so the compiler automatically knows when to drop a value from the scope?

When you variable leaves the scope the drop is called automatically, yes. Normal variable which you create inside of function or block lives precisely from the moment it's declared to the moment the scope ends (it it's not returned from function or block, of course).

You can not change that. No matter how many lifetime marks you will add to your program it would behave like that.

Lifetimes are needed when you program calls other functions or receives data from outside.

Consider the following function:

fn foo(x: &str, y: &str) -> &str {
   …
}

And you use it like that:

fn main() {
  let s1 = String::from("xxx");
  let r1: &str = &s1;
  let r3: &str;
  {
    let s2 = String::from("yyy");
    let r2: &str = &s2;
    r3 = foo(r1, r2);
  }
  println!("{r3}");}

Is this program valid or not? The answer is… “who knows”! We have no idea whether this function returns something borrowed from r1 (which points to s1) or from r2 (which points to s2). s2 goes away before r3 is printed while s1 goes away after thus if r3 borrows from s1 program is valid and if r3 borrows from s2 it's invalid.

Now, if we would add names, something like this:

fn FindSubstringOrPanic(haystack: &str, needle: &str) -> &str {
   …
}

Then human would immediately understand that result lives as long as hasystack and that means FindSubstringOrPanic may be called like this: FindSubstringOrPanic(r1, r2) and couldn't be called like this: FindSubstringOrPanic(r2, r1).

But compiler is not human. He doesn't know what words “haystack” or “needle” mean, it doesn't know why “FindSubstringOrPanic” would use these names, it doesn't know their significance… it needs additional guidance! And lifetimes make it possible to provide that:

fn FindSubstringOrPanic<'a>(haystack: &'a str, needle: &str) -> &'a str {
   …
}

Compiler still doesn't know what “haystack” or “needle” mean but it, now, knows that we are returning something related to “haystack”. That means that FindSubstringOrPanic(r1, r2) is valid while FindSubstringOrPanic(r2, r1) is not valid.

And the story goes from there: we need to tell the compiler something about how our functions and data structures behave.

That is why lifetimes exist and that is why we have to specify them on function borders.

There are no need to specify them in the function itself: compiler can see everything that is happening there.

Again: except for some HRBT-related corner cases (which you would learn much later) lifetimes don't affect the code generation of the compiler at all. They just explain to the compiler why you believe that your program is valid.

And then compiler verifies that your story is consistent and makes sense (from the rules which are part of Rust specification, anyway, compiler doesn't have common sense and couldn't understand what your program does, but it's very good at verifying the formal rules).

1 Like

Yes, that is the way for managing memory (and resources in general) in Rust. Lifetime annotations don't affect when values are dropped. Scope and passing around values by-value does.

I don't disagree with the meat of what you're saying, but I do feel the terminology adds an unnecessary speedbump / invitation to get the wrong idea. "Lifetime" is a common term for variables and values,[1] and I've see people confused by this countless times. I'm not alone in recognizing this.[2] And NLL further separated the connection between the two meanings.[3]


  1. e.g. all over these RAII and related pages I almost linked to above ↩︎

  2. And again; and once more. ↩︎

  3. I'm glad that pre-NLL label-to-lifetime RFC didn't go through; walking through NLL examples is gnarly enough without them. ↩︎

1 Like

I'm curious to know what terminology you would use instead?

Data does have a time when it comes into existence and it does have a time when it ceases to exist. "lifetime" seems like a good way to describe that interval. Indeed "lifetime" is what the C++ standard uses to describe this even though C++ does not have anyway of expressing lifetimes in source code.

1 Like

Those are the lifetimes which do not correspond to Rust lifetimes -- the point from an object's initialization to it's destruction at runtime. Hence the retrospective desire to call Rust lifetimes something else.

As for what best to call them, I don't have any ideas beyond the suggestions in the links already provided.[1] I also strongly doubt the terminology will actually change at this point.


  1. I'm completely indoctrinated to the Rust meaning at this point, personally, but have found need to explain and clarify the distinction countless times. ↩︎

1 Like

Hmmm...Could you show me an example?

Rust lifetimes are compile-time constructs which are erased before runtime for one, so they literally can't represent the runtime scope of a value in the general case.[1] The current rustc borrow checker implementation is NLL;[2] the RFC is dense but the distinction between (Rust) lifetimes and value scope is mentioned in this section.

The compiling portion of the OP has many examples of references with lifetimes shorter than the liveness scope of the referent, as explained in the rest of the discussion in this thread.

Any time you put a &'static _ in a local, non-returned variable is an example of a reference with a lifetime longer than the liveness scope of the reference (the local variable).

Confusion around T: 'static meaning values live forever is one of the top lifetime misconceptions.[3]

(More) code examples of those three:

// Trait bound example: `t` is destructed even though its type satisfies a
// `'static` bound
fn f<T: 'static>(t: T) {
    drop(t);

    // Error
    // let t = t;
}

// Reference lifetime example: Lifetimes of references don't reflect the
// liveness scope of the value of the reference, nor of the referent
fn g<'not_necessarily_static>() {
    let s: &'static str = "xyz";

    // Error (`s` doesn't last for `'static`)
    // let ss: &'static &str = &s;
    
    // Compiles (even though `*s` lasts for `'static`)
    let briefly_borrowed: &'not_necessarily_static str = &*s;
}

(Playground.)


  1. (halting problem) ↩︎

  2. non-lexical lifetimes ↩︎

  3. warning, that link uses "lifetime" for both (Rust) lifetimes and value liveness scopes... ↩︎

1 Like

Indeed, people are confused about lifetimes. But I don't believe any other word would have cleared confused. Because the main confusion newbies have is related to amount of lifetimes involved. And you couldn't fix that by picking different word.

Consider simple &str reference. Newbies often think there's only one lifetime related to it and try to understand how it works. But there are at least half-dozen of them!

Take that r3 reference in my example above. How many lifetimes it touches?

  1. Type str, itself, have lifetime (here it's not too much interesting, 'static, but other types may have different lifetimes).
  2. Object which hold said str object, s1 have not one, but two lifetimes:
    1. Object storage have lifetime where it exist.
    2. Object content have lifetime where it's useful.
  3. Then, of course, r3 have both lifetime and one more:
    1. Object storage have lifetime where it exists.
    2. Object content have lifetime where it's useful (it goes from foo call to printf call).
    3. That actual referenced content which said object content references (it's valid as content of s1 is valid).
    4. And last but not least that one, single, lifetime that such reference usually includes, lifetime of type which reference references.

Note that all these lifetimes may be different, but often are the same. That is what causes confusion and makes newbies question their sanity. They try to make sense of two or three lifetimes which exist in their mind, but in reality there are half-dozen or more (I'm not sure I listed them all) and when what they perceived as one, single lifetime, suddenly, splits… it's yet another crisis in the head of newbie.

How would different word would solve that problem? It would just make it worse.

Which is perfectly fine. That is main source of confusion for newbies: they start with the assumption that only named lifetimes exist and when there are two variables with the same marks (like x: &'a str, y: &'a str parameters) then involved lifetimes are identical… which is not true, of course).

Once you realize that there are lots of [related] lifetimes involved and [most of them] are unmarked… confusion goes away mostly.

Yes. And it has the same root cause: newbies don't realize just how many different lifetimes are there in their program!

They try to merge different things into one conglomerate in their head “for simplicity” but that's clear violation of principle five from RFC1925 and, as usual, leads not to clarity, but to headache.

They do correspond to Rust program lifetimes. One of them, #2.1 in my list above. And it's also critical lifetime: lifetimes of all possible borrows must be shorter than that liftime!

Thanks god that wouldn't happen. The last thing we need is explanation that there are nameable regions, unnameable regions and one of these unnameable regions is called lifetime “for clarity”.

It added one additional meaning to existing half-dozen of them. That's why it's non-event: if you realize that in addition to marked lifetimes there are dozens of different lifetimes which can not be explicitly marked in Rust program then NLL doesn't add that much complexity, and if you don't realize that then you may try to understand Rust lifetimes forever with zero success.

I think you are right that it does help to know this, and also probably right that changing the terminology won't help at a lot, especially at this point in time. I can certainly imagine that other types of confusion could be created by the use of some other term.

But for me at least, a lot of my confusion was actually cleared up by the material from pretzelhammer and more recently by the material from quinedot. The way these two documents explain things is extremely helpful to me.

What would have really helped me is having this material somewhere in the official docs, and then also having it mentioned and referenced from "the book". I realize that not everything can or should be explained in the book, but I think some hints about these topics and a link to further explanation is needed. But I also realize that adding this is quite a bit of work!

1 Like