Need to understand correct lifetime & borrow in my program


#1

Hi,
Happy holidays to all!
I am reading data from xml file (using roxmltree crate) and creating a data dictionary out of it. There are lot of strings which are duplicated in many xml block so I was thinking of not duplicating string but instead store a reference to it in multiple places. Below is some code with comments explaining what I am trying to do -

    #[derive(Debug)]
    pub struct DataDictionary<'a> {
        field_set: HashSet<String>, // as I get a field name, i store owned String in this set. and everywhere else i want to use the reference of this string.
        fields_by_tag: HashMap<u32, FieldEntry<'a>>,
        tag_name_map: HashMap<&'a String, u32>,
        components: HashMap<String, ComponentEntry<'a>>,
        groups: HashMap<String, Group<'a>>,
    }

impl<'a> DataDictionary<'a> {
    fn new() -> Self {
        Self {
            field_set: HashSet::new(),
            fields_by_tag: HashMap::new(),
            tag_name_map: HashMap::new(),
            components: HashMap::new(),
            groups: HashMap::new()
        }
    }

    fn add_field(&mut self, fld: FieldEntry<'a>) {
        // TODO: add check so that tag should not be present already - duplicate tag, invalid xml
        self.fields_by_tag.insert(fld.number, fld);
    }

    fn get_field(&self, tag: &u32) -> Option<&FieldEntry> {
        self.fields_by_tag.get(tag)
    }

    fn add_field_tag(&mut self, name_ref: &'a String, tagnum: u32) {
        self.tag_name_map.insert(name_ref, tagnum);
    }


#[derive(Debug)]
struct FieldEntry<'a> {
    number: u32,
    name: &'a String,
    ftype: String,
    values: HashSet<FieldValueEntry>,
}

impl<'a> FieldEntry<'a> {
    fn new(number: u32, name: &'a String, ftype: &str) -> Self {
        Self {
            number,
            name: name,
            ftype: ftype.to_string(),
            values: HashSet::new()
        }
    }

    fn set_valid_value(&mut self, val: FieldValueEntry) {
        self.values.insert(val);
    }
}

pub fn create_data_dict(fix_xml: &str) -> Option<DataDictionary> {
    let mut file_data = String::with_capacity(1024*64);
    let mut file = File::open(fix_xml).unwrap();
    file.read_to_string(&mut file_data).unwrap();
    let doc = Document::parse(&file_data).unwrap();
    let mut dictionary = DataDictionary::new();
    for root_child in doc.root_element().children().filter(|node| node.node_type() == NodeType::Element) {
        match root_child.tag_name().name() {
            "fields" => {
                field_handler(root_child, &mut dictionary);
            },
            "components" => {
                component_handler(root_child, &mut dictionary);
            },
            _ => {
                println!("No processing this");
            }
        }
    }
    Some(dictionary)
}

fn field_handler(field_node: Node, dict: &mut DataDictionary) {
    for node in field_node.children().filter(|n| n.node_type() == NodeType::Element) {
        let fname = node.attribute("name").unwrap();
        let fnum = node.attribute("number").unwrap().parse::<u32>().unwrap();
        let ftype = node.attribute("type").unwrap();
        let mut f_entry: FieldEntry;
        match dict.get_field_set(fname) {
            Some(name_ref) => {
                dict.add_field_tag(name_ref, fnum);
                f_entry = FieldEntry::new(fnum, name_ref, ftype);
            },
            None => {
                dict.insert_field_set(fname);
                let name_ref = dict.get_field_set(fname).unwrap();
                dict.add_field_tag(name_ref, fnum);
                f_entry = FieldEntry::new(fnum, name_ref, ftype);
            }
        }
        for child in node.children().filter(|n| n.node_type() == NodeType::Element && n.has_tag_name("value")) {
            let fvalue_entry = FieldValueEntry::new(
                child.attribute("enum").unwrap(),
                child.attribute("description").unwrap()
            );
            f_entry.set_valid_value(fvalue_entry);
        }
        dict.add_field(f_entry);
    }
}

When I compile this, I get following error -

error[E0623]: lifetime mismatch
   --> src/codegen.rs:189:22
    |
181 | fn field_handler(field_node: Node, dict: &mut DataDictionary) {
    |                                          -------------------
    |                                          |
    |                                          these two types are declared with different lifetimes...
...
189 |                 dict.add_field_tag(name_ref, fnum);
    |                      ^^^^^^^^^^^^^ ...but data from `dict` flows into `dict` here

error: aborting due to previous error

For more information about this error, try `rustc --explain E0623`.

I am not able to understand this lifetime requirement. I was assuming that data dictionary objects data variables contains references to each other, they should share the lifetime but I am wrong it seems. Also, when I try -
rustc --explain E0623
I get “error: no extended information for E0623”

Could someone help to explain what is going on here? Also, is it a good idea to store these kind of references or should I just duplicate the strings to keep things simple?


#2

Your error is about get_field(&self, tag: &u32) -> Option<&FieldEntry>. If you don’t specify lifetimes, it means &FieldEntry is not guaranteed to live longer than &self. In your case it needs to be, so it should be &'a FieldEntry.

However, use of temporary borrows in structs you’re planning to use for longer than a single function call will paralyze your whole program. It only gets worse from here. Every method will need that 'a. If you forget it, the default will shorten lifetime below what you need. And somewhere you’ll hit a case where you need to add a new string, but you don’t have access to a 'a-long-lived place to put it.

Use Arc<str> instead. It’s an owned type, so you’ll be free to pass the strings around and add/remove names even after creation of the structs.

There are also plenty of string interning libraries on crates.io that will efficiently dedupe strings.