Storing an object as owned and reference


I want to store a reference to a String, that is stored in another structure.

Obviously I do not need to, I could just use String in both places. But two objects, when I need one, offends me.

Can I make this work?

use std::collections::HashMap;

struct Datum<'a> {
    subject: &'a String,
fn main() -> std::io::Result<()> {
    let mut subject_store: HashMap<String, usize> = HashMap::new();
    let mut data: Vec<Datum> = vec![];
    let mut i = 0;

    let input_data = "subject1 \nsubject2\nsubject1 \nsubject3\n";
    for line in input_data.lines() {
        let mut split: Vec<&str> = line.split(' ').collect();
        let subject = split.remove(0).to_string();
        if subject_store.get(&subject).is_none() {
            subject_store.insert(subject.clone(), i);
            i += 1;

        data.push(Datum { subject: &subject });
21 |         data.push(Datum { subject: &subject });
   |         ----                       ^^^^^^^^ borrowed value does not live long enough

It's not possible safely, without larger changes.

You could use Arc (or Rc).

Everything is possible, the question is, of course, whether that's feasible.

Let's start from that part:

This code may, potentially, move all the elements in the subject_store to another place in memory (when hash is full and needs rebalancing).

And here we keep references to these strings that are randomly shiffled in memory. The question immediately arises: what do we expect to happen if these objects would be moved?

Offends you… in what sense? If you just want the warm fuzzy feeling when your program, which would be both slower and more memory-hungry than program that copies these strings just simply works… then there are some crates which implement various GC-based schemes.

Or, if you just want to not duplicate strings there are shared ownership (Arc or Rc).

Or you may use some data structure which doesn't move keys when it grows (not sure if there are something like that in Rust's standard library, but such data structures are certainly possible).

Ultimately the question is what this in your question even means.

Compiler is absolutely correct when it rejects your code, but before you may write some other one your would need to decide what this you are talking about, at least to yourself. Then it would be possible to decide what to do.

There are simple changes to this which will allow this to compile but whether it would work in a real scenario is a different question.

Ultimately in this case the data you want to reference is in the input_data variable so this does last long enough. This owns the data you want to reference in borrow checker parlance.

However you copy that data into its own dataspace in the loop when you convert the reference to_string(). Now the data is owned by the subject variable but that only exists for that iteration of the loop as it will be overwritten next time.

If you remove the copy into the subject variable and instead only hold a reference to the input data then you can then push that reference into the data vector which has the same lifetime as the input data. You still need a copy for the subject_store to hold into so it does get copied into there but your example can compile without an additional copy.

In reality of course you may not be getting the input data from a string so that may also not have the same lifetime as your input data in which case you need something else to own the data. You can't refer directly into the Hashmap as a previous reply pointed out so either both need a copy so they can guarantee it lasts as long as them or you can have them share ownership using Rc/Aarc (Rc<T>, the Reference Counted Smart Pointer - The Rust Programming Language)

In general in structs I find I regularly end up wanting them to have their own lifetime so that they can be built in functions and returned so I'll often take the hit on the second copy but it depends on the size and what is going to happen with the data next. If it remains closely related I might try and make the references work, use the Rc, or use a Cow (Cow in std::borrow - Rust) if you want to borrow when you can but sometimes need to own the data.

That is the part I was missing.

I have been doing a lot of C, and I was forgetting these structures rebalance themselves. Of course they do


There is another important concept I missed, which coming from C I should have realised:

let foo = String::new();
let bar = &foo;
// `bar` is a reference to the memory where `foo` is
// `bar` is not a (magical) reference to `bar` itself

Hence when foo moves bar is invalid, and Rust being Rust if foo could move bar is invalid at compile time