How can I have references into a lazily-populated hashmap cache without breaking aliasing?

I have a function which takes as input a slice of items and needs to output what amounts to another slice (the format of the output is determined by a library that I am using, so I can't make it not a slice) of references to values that are lazily generated with a cache based on the input slice's values. It's fine if the resulting slices are bound to not outlive the struct that contains the cache, as is the case in my minimal example. Although I understand why rust is rejecting my code, I do not know how to correct it.

It would not be ideal to loop over the input slice twice, once to populate and another time to build the output as the input slice is very long and most of the values are not of the String type in my real code. Furthermore, I'd rather avoid double hashmap lockups (I know that it's double in the example code, but that's a separate issue — doing it entirely separately makes this part of the problem harder to solve). I Do think that such double iteration would work, however.

Playground Link

use std::collections::HashMap;

enum InputType{
    String(String),
    NotString
}

struct CachedConverter{
    mapping: HashMap<String, Box<u32>>
}

impl CachedConverter{
    
    pub fn new() -> Self{
        Self{
            mapping: HashMap::new()
        }
    }
    
    pub fn convert<'s, 't>(&'s mut self, strings: &'t [InputType]) -> Vec<&'s u32>{
        let mut to_return = Vec::new();
        
        for string in strings{
            if let InputType::String(ref string) = string{
                let entry = self.mapping.entry(string.clone());
                entry.or_insert_with(|| Box::new(string.parse().unwrap()));
                to_return.push(self.mapping[string].as_ref());
            }
        }
        
        to_return
    }
    
}
error[E0502]: cannot borrow `self.mapping` as mutable because it is also borrowed as immutable
  --> src/lib.rs:25:29
   |
20 |     pub fn convert<'s, 't>(&'s mut self, strings: &'t [InputType]) -> Vec<&'s u32>{
   |                    -- lifetime `'s` defined here
...
25 |                 let entry = self.mapping.entry(string.clone());
   |                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here
26 |                 entry.or_insert_with(|| Box::new(string.parse().unwrap()));
27 |                 to_return.push(self.mapping[string].as_ref());
   |                                ------------ immutable borrow occurs here
...
31 |         to_return
   |         --------- returning this value requires that `self.mapping` is borrowed for `'s`

For more information about this error, try `rustc --explain E0502`.
error: could not compile `playground` due to previous error

You could use a reference-counted pointer for example

use std::sync::Arc;
use std::collections::HashMap;

enum InputType {
    String(String),
    NotString,
}

struct CachedConverter {
    mapping: HashMap<String, Arc<u32>>,
}

impl CachedConverter {
    pub fn new() -> Self {
        Self {
            mapping: HashMap::new(),
        }
    }

    pub fn convert(&mut self, strings: &[InputType]) -> Vec<Arc<u32>> {
        let mut to_return = Vec::new();

        for string in strings {
            if let InputType::String(ref string) = string {
                let entry = self.mapping.entry(string.clone());
                let entry = entry.or_insert_with(|| Arc::new(string.parse().unwrap()));
                to_return.push(Arc::clone(entry));
            }
        }

        to_return
    }
}

If you use hashbrown::HashMap instead of std::collections::HashMap, you can also avoid unnecessary cloning of the String when the entry already exists, by using the .entry_ref method, i.e.

- use std::collections::HashMap;
+ use hashbrown::HashMap;
-                let entry = self.mapping.entry(string.clone());
+                let entry = self.mapping.entry_ref(string)

Rust Playground

1 Like

You don't need reference counting for this; you can just separate the insertion from the lookup.

pub fn convert(&mut self, strings: &[InputType]) -> Vec<&u32> {
    let strings = strings.iter().filter_map(|s| match *s {
        InputType::String(ref string) => Some(string),
        _ => None,
    });
    
    for string in strings.clone() {
        self.mapping
            .entry(string.clone())
            .or_insert_with(|| Box::new(string.parse().unwrap()));
    }
    
    strings.map(|key| &*self.mapping[key]).collect()
}

Also, please use whitespace and read others' Rust code in order to get acquainted with the idiomatic coding style – your code is hard to read due to the non-conventional use of whitespace.

If you absolutely need a Vec<&u32> then – assuming that this cache is never deleting anything anyways – using an arena could be a reasonable approach, e.g. with the typed_arena crate.

use hashbrown::HashMap;
use typed_arena::Arena;

enum InputType {
    String(String),
    NotString,
}

struct CachedConverter<'c> {
    arena: &'c Arena<u32>,
    mapping: HashMap<String, &'c u32>,
}

impl<'c> CachedConverter<'c> {
    pub fn new(arena: &'c Arena<u32>) -> Self {
        Self {
            arena,
            mapping: HashMap::new(),
        }
    }

    pub fn convert(&mut self, strings: &[InputType]) -> Vec<&'c u32> {
        let mut to_return = Vec::new();

        for string in strings {
            if let InputType::String(ref string) = string {
                let entry = self.mapping.entry_ref(string);
                let entry = entry.or_insert_with(|| self.arena.alloc(string.parse().unwrap()));
                to_return.push(*entry);
            }
        }

        to_return
    }
}
[dependencies]
hashbrown = "0.12.0"
typed-arena = "2.0.1"

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.