Can I fix this "borrowed value does not live long enough" error with lifetimes parameters?


#1

I defined the following structs and methods:

struct DataVar<'a> {
    name: &'a str,
    val: &'a [f64],
}

impl<'a> DataVar<'a> {
    fn new(name: &'a str, val: &'a [f64]) -> Self {
        Self { name, val }
    }
}

struct DataSet<'a> {
    name: &'a str,
    datavars: &'a [&'a DataVar<'a>],
}

impl<'a> DataSet<'a> {
    pub fn new(name: &'a str, datavars: &'a [&DataVar]) -> Self {
        Self { name, datavars }
    }
}

struct DataGroup<'a> {
     name: &'a str,
     datasets: &'a [&'a DataSet<'a>],
}

impl<'a> DataGroup<'a> {
     fn new(name: &'a str, datasets: &'a [&DataSet]) -> Self {
        Self { name, datasets }
    }
}

Now the following code works:

let x = DataVar {
    name: "X",
    val: &[1., 2., 3.],
};
let dataset = DataSet {
    name: "dataset",
    datavars: &[&x, &x, &x],
};
let dataplot = DataGroup {
    name: "datagroup",
    datasets: &[&dataset, &dataset],
};

while the following does not work (I get the error “borrowed value does not live long enough” when defining dataplot):

let x = DataVar::new("X", &[1., 2., 3.]);
let dataset = DataSet::new("dataset", &[&x, &x, &x]);
let dataplot = DataGroup::new("datagroup", &[&dataset, &dataset]);

Can this error be fixed with lifetimes parameters, or am I forced to define intermediate variables like let y = &[&x, &x, &x]? I would like all references to live like x, but cannot find out how to specify it with lifetime parameters.


#2

The error message I see is

error[E0716]: temporary value dropped while borrowed
  --> src/main.rs:36:40
   |
36 | let dataset = DataSet::new("dataset", &[&x, &x, &x]);
   |                                        ^^^^^^^^^^^^ - temporary value is freed at the end of this statement
   |                                        |
   |                                        creates a temporary which is freed while still in use
37 | let dataplot = DataGroup::new("datagroup", &[&dataset, &dataset]);
   |                                              -------- borrow later used here
   |
   = note: consider using a `let` binding to create a longer lived value

This can’t be fixed by adding lifetimes. It has to be changed in the compiler so that these temporaries are not dropped. (To be honest, I’ve often felt that the compiler drops temporaries too early in certain cases, and the precise circumstances in which it does so are difficult to understand)


#3

Lifetimes don’t do anything, and can’t change program behavior/generated code. They only describe what the program is doing anyway.

And in this case the program is dropping a temporary value early.

It wouldn’t be a problem if you used owned values in the structs, since then your code could keep the values for as long as it needs, instead of being attached to temporary borrows from the calling environment.


#4

You can fix this by using temporaries.

let x = DataVar::new("X", &[1., 2., 3.]);
let dataset = &[&x, &x, &x];
let dataset = DataSet::new("dataset", dataset);
let dataplot = &[&dataset, &dataset];
let dataplot = DataGroup::new("datagroup", dataplot);

#5

Yes I understand that I can avoid this error either using owned values or temporaries, but I hoped there was a better solution. I am implementing a plotting library (which I hope to publish on crates.io one day), and I think those two solutions have the following shortcomings:

  1. Owned values imply allocations. Since I am using those structs just to organize data before passing them to the lib, and the passed data could be in general extensive, I would like to avoid copying all values to the heap.

  2. Using temporaries means that all users of the plotting lib would have to write 5 lines of code for each plot instead of 3… however if you confirm that there is no better solution to this error (including implementing the structs and/or constructors differently) then I will need to follow this path.

I agree with @ExpHP: I also feel that the compiler often drops temporaries too early (the code above is just one case I hit recently, which I reported because I hoped there was a better solution). Maybe references created as a function argument could be assigned to the parent’s scope? (after all, they could be considered as created and passed there). Or otherwise maybe there could be a way to instruct the compiler to extend the life of a reference to the parent’s scope, something like a 'super lifetime parameter, or a super() method, which would allow for instance to write:

let dataset = DataSet::new("dataset", &'super [&x, &x, &x]);

or:

let dataset = DataSet::new("dataset", &[&x, &x, &x].super());

If there is any agreement on this, should I open a discussion on Rust Internals?

Thanks


#6

The lifetime of temporaries (and the ensuing “too early drop”) is a known issue - e.g. see this comment and the linked document there.

AIUI, the core issue seems to be around the place where the temp would end up being dropped, and that being somewhat invisible/non-explicit in code; this can have ramifications for unsafe code, for example. Given the fix is to insert an explicit let binding, I suspect this hasn’t been considered too big of a problem - it’s a bit of an ergonomic hit in some cases, but there’s an argument to be made that being explicit with such things is desirable.


#7

Owned values are not on the heap. Ownership and allocation are separate concepts. Owned values can exist on the stack and be temporary too.

References can’t exist without a corresponding owned value. They’re not a way to avoid allocations, but to cheaply share access to the existing allocations.

It makes sense to borrow [f32] and strings, but the rest are types from your library, so users will have to make specifically for you anyway.


#8

Interesting reading! So now that we have NLL I’m really looking forward for “Better Temporary Lifetimes”, as this is one of the issues I hit more often, and I think it is an actual obstacle for new programmers approaching Rust.


#9

Thanks for clarifying, if I understand correctly you suggest to change my code as follows:

#[derive(Debug, Default)]
struct DataVar<'a> {
    name: &'a str,
    val: &'a [f64],
}

impl<'a> DataVar<'a> {
    fn new(name: &'a str, val: &'a [f64]) -> Self {
        Self { name, val }
    }
}

#[derive(Debug, Default)]
struct DataSet<'a> {
    name: &'a str,
    datavars: Vec<&'a DataVar<'a>>,
}

impl<'a> DataSet<'a> {
    pub fn new(name: &'a str, datavars: Vec<&'a DataVar>) -> Self {
        Self { name, datavars }
    }
}

#[derive(Debug, Default)]
struct DataPlot<'a> {
     name: &'a str,
     datasets: Vec<&'a DataSet<'a>>,
}

impl<'a> DataPlot<'a> {
     fn new(name: &'a str, datasets: Vec<&'a DataSet>) -> Self {
        Self { name, datasets }
    }
}

Now the following works:

let x = DataVar::new("X", &[1., 2., 3.]);
let dataset = DataSet::new("dataset", vec![&x, &x, &x]);
let dataplot = DataPlot::new("dataplot", vec![&dataset, &dataset]);

Can you please confirm?

Thanks


#10

Not quite. Vec<&Foo> usually still requires caller to have another Vec or array of <Foo> somewhere to borrow from.

Also on modern architectures references are relatively expensive, because they may reduce cache locality, indirect access is costly when CPU can’t predict/speculate it, and they don’t get autovectorized.

Don’t use Vec of references to such tiny objects, unless you have to use polymorphism. You can even make DataVar a Copy type, because this type itself is nothing more than just a couple of references. Use faster, more efficient Vec<DataVar>.

If you expect users to use one dataset multiple times, then Vec<&DataSet> might be OK (since cloning of the Vec inside it would duplicate its heap data). But if users would typically use each dataset once, then Vec<DataSet> is fine too and it saves a layer of indirection.


#11

This is the data model that I envision for the plotting library I’m implementing:

  • Users first define several DataVars, each one referencing a series of data.
  • Then they define one or more DataSets, which are collections of DataVars, with the requirement that they have same length.
  • Finally they define a DataPlot, which contains at least one DataSet, or more DataSets for instance when not all DataVars have the same length.

To create several plots, a user has to define a DataPlot for each plot, and doing so I expect he would typically reuse several DataVars (for instance the time series), and maybe also some DataSets.

I used references to avoid asking the user to clone DataVars when reusing them. Now that you know the use case scenario, do you still think it is acceptable/advisable to use Vec<DataVar> and Vec<DataSet>? In this case, do you think I should make both DataVar and DataSet Copy types, or only DataVar, or neither one and ask the user to clone them when reusing?

Thanks


#12

DataVar can be Copy, because it only contains shared references. It’d make it a copy type, because it’s small and copy is convenient.

DataSet can’t be Copy because of the Vec in it. Cloning will clone the Vec’s content. It could be cheaper to clone with Arc<Vec<>>.

If DataSet is supposed to be reusable (same instance in multiple places) then you could take it by reference.


#13

Thanks to your hints I made good progress, but now I’m facing the following issue.

Considering the following data model:

use serde::ser::Serialize;

#[derive(Debug, PartialEq, Clone, Copy, Default)]
struct DataVar<'a, T: Serialize> {
    name: &'a str,
    val: &'a [T],
}

#[derive(Debug, PartialEq, Clone, Default)]
struct DataSet<'a, T: Serialize> {
    name: &'a str,
    datavars: Vec<DataVar<'a, T>>,
}

it clearly does not work when values are of different types. For instance the following does not work:

let x = DataVar {
    name: "x",
    val: &[1, 2, 3],
};
let y = DataVar {
    name: "y",
    val: &[1., 2., 3.],
};
let z = DataSet {
    name: "z",
    datavars: vec![x, y],
};

I understand why, but I’m struggling to find a good solution which satisfies the following three requirements:

  • The slices could point to very large data, so I should avoid copying all values.
  • I need to efficiently iterate “by column” (one slice at a time). This is required to serialize data to JSON (one full vector at a time).
  • I need to efficiently iterate “by rows” (all first elements of the DataSet variables, then all second elements, and so on). This is required to serialize data to CSV (one row record at a time).

I attempted with enums, macros, and Any trait, but to my understanding neither one satisfies all requirements above… do you have any suggestions?

Thanks


#14

In this case you can’t use the T generic type, because it means one type, but you want each element of the Vec to have its own type (like a hypothetical vec![T, U, V, W…], but there are not enough alphabets in the world to give letters to all possible Vec elements ;))

  • If the set of types that can be allowed in datavars is known and limited (e.g. either string or number, and nothing else), then you can use datavars: Vec<EnumOfAllowedTypes> + enum EnumOfAllowedTypes { String(DataVar<String>), Number(DataVar<f64>) }. See serde_json::Value for example.

  • If the set of types is open (technically, infinitely large), you’ll need to use polymorphism. datavars: Vec<Box<dyn DataVar>>. Note that you’ll have to make one universal DataVar trait that works for all types at the same time.


#15

I considered using serde_json::Value, but isn’t serde_json::to_value(datavar.val).unwrap() copying all slice values into the new Value type variable? Moreover I thought that calling method .as_array().unwrap() on all Values at every row iteration cycle was not efficient. Do you think it is a good solution instaed?


#16

I don’t recommend literally using serde_json’s type, but using it as an inspiration how to make a multi-type enum.

enum { Vec<Sring>, Vec<f64> } is faster than Vec<enum { String, f64 } >.


#17

OK, I made new progress… but now I’m stuck with a new issue. Sorry if I’m asking many questions, but things are getting definitely more complex than I would have liked to.

Considering for now only i32 and f64 as an example, I came up with the following code:

#[derive(Debug, PartialEq, Clone, Copy)]
enum Data<'a> {
    I32(&'a [i32]),
    F64(&'a [f64]),
}

impl<'a> From<&'a [i32]> for Data<'a> {
    fn from(v: &'a [i32]) -> Data<'a> {
        Data::I32(v)
    }
}

impl<'a> From<&'a [f64]> for Data<'a> {
    fn from(v: &'a [f64]) -> Data<'a> {
        Data::F64(v)
    }
}

#[derive(Debug, PartialEq, Clone, Copy)]
struct DataVar<'a> {
    name: &'a str,
    data: Data<'a>,
}

impl<'a> DataVar<'a> {
    fn new<T>(name: &'a str, data: T) -> Self
    where
        T: Into<Data<'a>>,
    {
        Self {
            name,
            data: data.into(),
        }
    }
}

#[derive(Debug, PartialEq, Clone)]
struct DataSet<'a> {
    name: &'a str,
    datavars: Vec<DataVar<'a>>,
}

impl<'a> DataSet<'a> {
    fn new(name: &'a str, datavars: Vec<DataVar<'a>>) -> Self {
        Self { name, datavars }
    }
}

In future I will need to include most primitive types, so this will become quite verbose, can you please confirm that I’m taking the correct approach?

Here is the issue I’m facing now. Once I create a new DataSet:

let x = [1, 2, 3];
let xvar = DataVar::new("x", &x[..]);
let y = [1., 2., 3.];
let yvar = DataVar::new("y", &y[..]);
let zset = DataSet::new("z", vec![xvar, yvar]);

how can I get the slices pointing to x and y out of zset?

Thank you again for your precious help.


#18

OK, I think I’m walking my way through… every time I have to do something with zset data, I need to pattern match through all variants and elaborate the extracted data in order to always output the same type (e.g. String). I hope it will work…

I have however another question. With the code above I need to pass slices explicitly, for instance:

let xvar = DataVar::new("x", &x[..]);

How can I make it work either when passing slices or references to arrays? In other words, I would like both of the following to work:

let xvar = DataVar::new("x", &x[0..2]);
let xvar = DataVar::new("x", &x);

I tried with the following:

impl<'a, T> From<&'a T> for Data<'a>
where
    T: AsRef<[i32]> + ?Sized,
{
    fn from(v: &'a T) -> Data<'a> {
        Data::I32(v.as_ref())
    }
}

impl<'a, T> From<T> for Data<'a>
where
    T: AsRef<[f64]> + ?Sized,
{
     fn from(v: &'a T) -> Data<'a> {
        Data::F64(v.as_ref())
    }
}

but it does not compile with error:

conflicting implementations of trait `std::convert::From<&_>` for type `main::Data<'_>`

#19

See answer here. Thanks anyway @kornel for all your hints!