Lifetime annotations, cannot borrow as mutable more than once

So I have thus struct that holds a list of data values, and a struct where it contains reference to that data in two different categories. (here is a simplified example in the Rust Playground).

struct DataClassifier<'a> {
    data: HashSet<String>,
    classification: Classification<'a>
}

impl <'a> DataClassifier<'a> {
    pub fn new<I>(data: I) -> Self where I: IntoIterator<Item = String> {
        Self {
            data: data.into_iter().collect(),
            classification: Classification::new(),
        }
    }
    
    pub fn classify_data_as_a(&'a mut self, data: &String) {
        if let Some(d) = self.data.get(data) {
            self.classification.a.insert(&d);
        } else {
            println!("Data not found, can't classify");
        }
    }
    
    pub fn classify_data_as_b(&'a mut self, data: &String) {
        if let Some(d) = self.data.get(data) {
            self.classification.b.insert(&d);
        } else {
            println!("Data not found, can't classify");
        }
    }
}

struct Classification<'a> {
    a: HashSet<&'a String>,
    b: HashSet<&'a String>
}

impl <'a> Classification<'a> {
    pub fn new() -> Self {
        Self {
            a: HashSet::new(),
            b: HashSet::new(),
        }
    }
}

fn main() {
    let one = "one".to_string();
    let two = "two".to_string();
    let three = "three".to_string();
    let mut dc = DataClassifier::new(vec![one, two, three]);
    
    // We classify
    dc.classify_data_as_b(&"one".to_string());
    dc.classify_data_as_a(&"two".to_string()); // <== Here is when it fails
    dc.classify_data_as_b(&"three".to_string());
    
    println!("DONE");
}

I understand that for having references to the data in a struct I need a lifetime specifier. The problem comes for the classify_data_as methods. The compiler seems to expect an explicit lifetime there, so I do that, but then, the mut borrow for self is bounded to that lifetime, so in the main function, I can't call the method more than once as it is borrowing self for it's own lifetime span.

I've seen some entries in Stack Overflow, and I kind of understand what's wrong, but can't figure out how I should rethink the code then instead.

First it may be a good question to answer this question: why do you have borrows in your struct at all? Doing this will not yield the same effect as having references would in e.g. C/C++.

Structs with borrows (and hence explicit lifetime annotations) have instances that are quite limited in their usage, especially in how long they (can) live, and hence this is an advanced feature.

In general, if someone does not know the precise semantics of this feature, it is better to not utilize it at all lest they run into issues like these.

So what can you do instead? Just put the values (rather than borrows to the values) in the struct:

struct DataClassifier {
    data: HashSet<String>,
    classification: Classification
}

And since the classification type has a lifetime annotation as well, this would apply recursively.

If you really really need 2 pointers to the data, you can wrap it in an std::sync::Arc or an std::rc::Rc, which provide shared ownership through (atomic) reference counters, hence the names.

4 Likes

Thanks. I was hoping to be able to use reference to avoid cloning the data (I'm not actually storing Strings, but a more complex struct) and keep the value in the main HashSet and references in the classification ones. I guess I don't have many choices here, but yeah you are right on the added complexity with borrows, I was entering into a lifetime definition mess.

I get the similar problem once and i resolved it by using box. if you do not mind to store data into heap.

 use std::boxed::Box

Don't use &'a mut self. Lifetime on self never does anything good. In your case you say that self can be borrowed only once ever, because it's borrowed exclusively (that's what mut says) for a lifetime that has to be so long, that is at least as long as the whole lifetime of the object (DataClassifier<'a> that 'a in both places ties them together, and structs can only refer to lifetimes that outlive them).

6 Likes

2 points:

  1. There is no need to import Box as it's one of Rust's most fundamental data types i.e. it provides, as you remarked, the ability to store objects in the heap
  2. Box provides exclusive ownership. Rc and Arc provide shared ownership, which means you can clone an Rc'd or Arc'd value and that clone is pretty cheap relative to cloning the value, assuming those values are fairly large.
2 Likes

It may have been suggested by others, however, the way I have seen this issue be addressed is to split-up your data type.

The issue seems to hinge on the design of classify_data_as_a and _b, less about how you are calling them. That's because the function borrows from one part of self whilst trying to mutate another part of self. That's something that can be accomplished with indirection (a pointer like Box). The use of Box while doable, may be a sign that perhaps the design could be re-considered.

pub fn classify_data_as_a(&'a mut self, data: &String) {
     if let Some(d) = self.data.get(data) {    // borrow self here
         self.classification.a.insert(&d);     // mutate self here
     } else {
         println!("Data not found, can't classify");
     }
}

Generally, one patterns for solving this problem, is to split-up your data structure (self: DataClassifier in the above). Splitting-up the data is another form of indirection (b/c x has to ref each part separately).

Perhaps there is a way to see what you are doing as a sequence of tasks, in which case would benefit from a more explicit separation of concerns.

Phase 1: Get the data
Phase 2: Read the data to generate a report (instantiate Classification)
Phase 3: Combine (if needed), the data with the report (move ownership to a single point of reference)

I might consider "transient" types, i.e., types that host your "work-in-progress". If this resonates, the Builder design pattern is worth looking up.

So, from:

struct DataClassifier<'a> {
    data: HashSet<String>,
    classification: Classification<'a>
}

to

// data-hosting structure
// includes a method `iter` that enables "read-access" (a borrow)
// ... where the iterator is the "interface" to your data
struct Data {
    data: HashSet<String>,
}

// report-hosting structure
// has some method that reads and interprets the Data hosting type
struct Classifier {
    classification: Classification<'a>
}

// some final struct created using a move semantic; it could be the structure you have now...
struct DataClassifier<'a> {
    data: HashSet<String>,
    classification: Classification<'a>
}

The design opens the door to customizing/configuring the Classifier.

To manage lifetimes whilst generating the report I might use a function such as the following. Note, it does not have to be a method, but if preferred, likely part of the Classification implementation.

fn report(data: &Data) -> Classification { /* create your report */ }

The classifying report creates new "read-only" pointers (borrows) to some subset of the data. The report also creates a collection to host the borrows (HashSet); this is where the mutation occurs by means of ownership. Once complete, transfer ownership to the caller.

The other nice thing about this function, is that only the caller has to deal with the possibility of not having the data. So, this is a "pure and complete" function without needing Option nor Result (great tools, but more "where required"). Finally, the now "pure" quality of the function makes it easy to run in-parallel in multiple threads.

One last observation re the iterator in process: instead of implementing IntoIterator (this namesake is different than what you can infer from IntoIter) I might suggest implementing Iterator by means of a struct that has "read-access" to the data; I believe what I saw was something with either a move or copy/move semantic.

3 Likes

Sweet @EdmundsEcho this is pretty insightful. What you mention makes a lot of sense, and basically moves the classification to a lazy approach by building it on request instead of updating it every time I add data.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.