New user trying to grapple with ownership

I have a piece of code here which basically tries to read in data from a csv and creates a struct out of the data.

extern crate csv;

use csv::Reader;
use csv::StringRecord;
use std::error::Error;



struct BusinessControl<'a> {
    country: &'a str
}

impl<'a> BusinessControl<'a> {
    fn new(record: &'a StringRecord) -> Self {
        let countryCode = record.get(1).expect("country data not found for business control");
        BusinessControl {
            country: countryCode
        }
    }
}

fn read_csv(path: &str) -> Result<Vec<BusinessControl>, Box<Error>> {
    let mut rdr = Reader::from_path(path)?;
    let mut result = Vec::new();
    for recordRes in rdr.records() {
        let record = recordRes?;
        result.push(BusinessControl::new(&record));
    }
    Ok(result)

}

#[cfg(test)]
mod test {
    use super::*;

    #[test]
    fn test_read_csv() {
        assert_eq!(read_csv(&"data.csv").unwrap().first().unwrap().country, "XX")
    }
}

Check is complaining

  --> src/lib.rs:29:5
   |
27 |         result.push(BusinessControl::new(&record));
   |                                          ------- `record` is borrowed here
28 |     }
29 |     Ok(result)
   |     ^^^^^^^^^^ returns a value referencing data owned by the current function

read_csv reads in csv and will definitely own the data. Returning result will definitely be returning something that is referenced by read_csv. Is there any other way to do this? Am i missing anything or doing something wrong here?

  1. The easiest way to fix this is to change:
struct BusinessControl<'a> {
    country: &'a str
}

to

struct BusinessControl {
    country: String
}

and then fix the compiler errors that pop up.

  1. As for what’s wrong with the code above:

You current BusinessControl<'a> object has a reference to a string. This means they can not outlive the life time of the underlying strings. In this particular case, the strings are created at let record = recordRes? and will die when the record goes out of scope.

However, you are trying to return a Vector of references to things that will vanish when the function stack frame is popped

The solution to this is … instead of having BusinessRecord store refs, have it store String.

4 Likes

To expand on @zeroexcuses’s answer, references are temporary views into another structure, and it looks like you want a permanent value, so you shouldn’t be using a reference. Only use references in types when you know that the type is only for temporary use, for example iterators over some other structure.

4 Likes

I would like to ask the lifetime of countryCode in the method BusinessControl<'a>::new(record: &'a StringRecord) -> Self. I don’t understand why it is 'a, not the lifetime associated to the body of the method? And in the initialization country: countryCode, seemingly the lifetime of country is also 'a instead of that associated to the method.

Think of lifetimes parameters like type parameters,

fn foo<T>(value: T) {}

T is not decided by the function foo, so foo must work for any T.

Similarly

fn bar<'a>(value: &'a i32) {}

'a is not decided by the function bar, so bar must work for any 'a

Now, because the caller must decide the lifetime 'a, it can’t pick any lifetimes from inside bar, in the very same way that anyone who calls foo can’t pick any types that are defined inside of foo.

fn foo<T>(value: T) {
    struct HelperType(u32);
}

No-one can pick HelperType to pass into foo other than foo. Similarly, no one can pick a lifetime inside of bar other than bar, because those lifetimes don’t exist outside of bar*.

* Note: Lifetimes don’t actually exist as values in Rust, they just used to solve a very complex constraint problem by using logic solvers (such as nll and polonius), these logic solvers, in a way, similar to Prolog.

I don’t know whether this is a response to my post. I wanted to ask why the initialization country: countryCode is valid. I look at the Rust book, listing 10-18, and it indicates that the lifetime of the reference r there is 'a. Here the local reference countryCode has a similar lifetime, and I wonder whether this lifetime will be propagated to country (which, in my eyes, results in a compilation error there)?

About your last paragraph: I don’t know the meaning of non-existence of lifetime “as values” in Rust. Seemingly you want to say that lifetimes are just used to describe a set of constraints, and then the compiler checks whether these constraints are “formally” consistant (like 3-SAT, etc). However, it does not exclude that we could find a model in the program (lifetimes corresponds to maybe substrings of the program), and possibly the constraints are consistent if and only if a model exists, something similar to, say, Gödel’s completeness theorem, or a much simpler fact of linear algebra: a system of linear equations has a zero if and only if there is no conflict, i.e., a linear combination leads to things like 1=0.

I think that to model this idea, try to picture your entire program if you were to copy/paste each function call. It would result in a very {} heavy code, which would allow you to visualize the lifetime for everything if you assign a lifetime to each occurrence of a block. Then the question becomes, does anything outside of a given lifetime 'a try to access or use or reference something that only lives for 'a?

This is essentially just a big logic problem, saying the following rules:

  • Variables are created near a {
  • They live for a lifetime 'lifetime and used
  • Variables are destroyed at the occurrence of their respective }
  • Nothing can use the destroyed variable

And lifetimes are used to denote the space between the first rule and the second. Checking for these rules being upheld is essentially just one big logic problem, which the compiler has to solve alongside the rest of the things it needs to do.

This seems to be interesting. How do we expand things like foo(bar1(bar2()),bar3())? Or a recursive function (which cannot be fully expanded at compile time without a nontrivial non-syntactically equivalent transform)? By the way, seemingly it is also necessary to consider moves.

Yes this is what I meant, lifetimes solving is a rather constrained version of 3-SAT so that we don’t run into the performance problems of complete 3-SAT solving. Of course this means that nll and polonius cannot correctly identify all sound programs, but it will reject all unsound programs (ignoring bugs).

foo(bar1(bar2()), bar3()) would be desugared to

let __temp_0 = bar2();
let __temp_1 = bar1(__temp_0);
let __temp_2 = bar3();
let __temp_3 = foo(__temp_2);

A recursive function like

/// Note the lifetimes,
/// On the inputs, slice is bound to 'a
/// And find is unbound
/// Our output is bound to 'a
/// This means that the output must come from the input, because it is
/// guaranteed to live at least as long as the slice
/// But it can't be find, because find has no relationship to 'a
/// So 'a is not guaranteed to live at least as long as find
fn binary_search<'a, T: Ord>(slice: &'a [T], find: &T) -> Option<&'a T> {
    use std::cmp::Ordering;

    // assert!(slice.is_sorted()); // currently is_sorted is on nightly
    
    // check if the slice is empty or has a single element
    match slice {
        // if empty, nothing to return
        [] => return None, 
        
         // if it has a single element, check if we have found the value
         // and return it, if we have not found the value, return None
        // because there are no other elements
        //
        // With lifetimes, val comes from slice, so it also is tied to
        // 'a, and so it is fine to return it
        [val] => return if val == find { Some(val) } else { None },
        
        // otherwise continue to the rest of the function
        _ => (),
    };
    
    // at this point we have multiple elements
    assert!(slice.len() >= 2);
    
    let middle = slice.len() / 2;
    
    // compare find to middle of the slice to see which side of the slice
    // to go to next
    match find.cmp(&slice[middle]) {
        // if it is equal to the middle of the slice, then return
        // because &slice[middle] comes from slice, we know that it must be tied to
        // 'a,, so it is fine to return it
        Ordering::Equal => Some(&slice[middle]),
        
        // For these next two, we know that the sub-slice is tied to 'a, as before
        // we also know that the output of this step must be 'a
        // so when we call binary_search, we look at its signature
        // and we see     binary_search<'b, T: ...>(&'b [T], T) -> Option<&'b T>
        // Oh, so because we know that the output of binary_search must
        // be tied to it's first input, cool
        // from there we can infer that the sub-slice must be 'a because the output
        // must be 'a, this does not conflict with anything, so we are fine
        // at no point during this checking do we look at the body of binary_search
        // again, so the compiler doesn't need to know about recursion to do checking
        // it just needs to register all function signatures before hand and refer to them while checking
        Ordering::Greater => binary_search(&slice[middle..], find),
        Ordering::Less => binary_search(&slice[..middle], find),
    }
}

Ok, let’s break it down.

For reference, I will simplify the problem so that only the essential parts are left then build up from there

struct BusinessControl<'cnty> {
    country: &'cnty str
}

impl<'a> BusinessControl<'a> {
    fn new(country: &'a str) -> Self {
        BusinessControl { country }
    }
]

First thing here, we see the lifetime 'a declared on the impl, this is just like bar where the lifetime comes from the user of BuisnessControl<'a>, not from anything we define here.

Next, in new we have a parameter &'a str. This parameter references some str that is guaranteed to outlive 'a (outlive in context means: live at least as long as). This is the same 'a as the impl declaration, so we are now binding the lifetime of BuisnessControl to 'a.

Note however, that the lifetime of the reference &'a str, let’s call it 'reference is not the same as 'a. This means that the reference does not live for as long as what it points to, which is true is most cases. The relationship between 'reference and 'a is that 'a outlives 'reference, notationally, this is 'a: 'reference.

Now when we put country inside of BuisnessControl we finish up tying the lifetimes together, and BuisnessControl<'a> is sent away to the caller.

When we use this like so,

let country = String::new("Algeria");

let control = BuisnessControl::new(&country);

...

Note how country lives longer than control.


This is a case where the reference lives just as long as what it points to. This is because lifetimes are tied to variable binding declarations, not the values.

let (x, y): (i32, &i32);

x = 0;
y = &x;

Sure, here’s what I mean:

fn foo(x: usize, y: usize) {
    x * y
}
fn bar1(string: &'static str) -> usize {
    string.len()
}
fn bar2() -> &'static str {
    match 0 {
        0 => "abc",
        _ => "ab",
    }
}
const fn bar3() -> usize {
    (std::f32::constants::PI * 232.213) as usize
}

fn main() {
    let val = foo(bar1(bar2()), bar3());
    println!("{}", x);
}

Turns into the following:

fn main() {
    let val = {
        let x = {
            let string = {
                match 0 {
                    0 => "abc",
                    _ => "ab",
                }
            };
            string.len()
        };
        let y = {
            (std::f32::constants::PI * 232.213) as usize
        };
        x * y
    };
    std::io::_print(std::format_args_nl!("{:?}", val));
}

With the last bit replacing the println being derived from the standard library println.

Thanks for this comprehensive answer. I first want to clarify that I was confused about the (compile-time, but all concepts we are discussing now in the language per se are only compile-time) concept of the lifetime - it is a concept associated to variable bindings, references and slices, but not values (the terminology “value” seems an analogue of “object” in Scheme standards). The lifetime of a reference should outlive the lifetime of the variable binding to the reference in question.

I wonder a (informal) reference of a complete speculation of ownership & lifetime.

For example, in your code of binary search, if we have a slice p: &'a [T], then the lifetime of the slice &p[st..ed] is also 'a - I look for a reference where this kind of speculations are articulated.

In general, for a statement or an expression E, I want to see the precise set of constraints of lifetime and ownership. Say {pre-conditions} E {post-conditions}, under what kind of pre-conditions is E valid? And what is changed from pre-conditions to post-conditions? (I look for informal descriptions)

I also want to understand when conditions are “checked”? For example, you clarified that lifetime generics should be determined at the caller side. Suppose we have a code like:

fn foo<'a>(...) -> &'a String {
    ...
}

...
{
    ...
    y = foo(...); // line A
    ...
} // Scope B

I wonder the generic 'a should be determined just at the line A where the function application takes place, or it only needs to be determined when exiting Scope B (there is a difference if the code following line A could be used sophisticatedly to infer 'a, but one cannot infer 'a just at the line A).

I also want to check whether my understanding is correct. Take your binary search as an example:

fn binary_search<'a, T: Ord>(slice: &'a [T], find: &T) -> Option<&'a T> {
    ...
    binary_search(&slice[..middle], find)
}

For simplicity I ignored the match block. As you said, we need to determine 'b here. The argument &slice[..middle] in the function application implies that 'a outlives 'b, and the type of the value of the function application is Option<&'b T>. Since this value is the result of the function, the type of which is Option<&'a T>, Option<&'b T> should be a subtype of Option<&'a T>, and therefore 'b outlives 'a. By reflexivity, we conclude that 'b is identical to 'a. I don’t know whether this more detailed argument is correct? I need to make this detailed enough because I want to understand how the precise set of constraints are (at least, naively) passed.

This is defined by the Index trait.

To do lifetime checking, Rust builds up a bunch of subtyping relationships and then adds in some constraints like &mut must be unique, then checks if these constraints are sound.

You can read more about this in nikomatis’s blog

https://smallcultfollowing.com/babysteps/blog/2017/02/21/non-lexical-lifetimes-using-liveness-and-location/

https://smallcultfollowing.com/babysteps/blog/2018/04/27/an-alias-based-formulation-of-the-borrow-checker/

https://smallcultfollowing.com/babysteps/blog/2019/01/21/hereditary-harrop-region-constraints/

Yes, this looks right