Baffling lifetime-related error (map required to live *longer* than its keys?!)

The intention of the code below is to work around the longstanding language limitation where you can't talk about the lifetime of a field of a struct, by making the relevant field be a mutable reference supplied at construction time. Interner::buf is a reference with lifetime 'a, and Interner::map holds keys that point into that buffer, which also have lifetime 'a.

One of the methods throws a compile error that I don't know how to fix and which honestly seems wrong.

   Compiling playground v0.0.1 (/playground)
error: lifetime may not live long enough
  --> src/lib.rs:39:19
   |
15 | impl<'a> Interner<'a> {
   |      -- lifetime `'a` defined here
...
27 |     pub fn intern(&mut self, s: &str) -> Substr {
   |                   - let's call the lifetime of this reference `'1`
...
39 |         let old = self.map.insert(k, ss);
   |                   ^^^^^^^^^^^^^^^^^^^^^^ argument requires that `'1` must outlive `'a`
   |
   = note: requirement occurs because of a mutable reference to `HashMap<&str, Substr>`
   = note: mutable references are invariant over their type parameter
   = help: see <https://doc.rust-lang.org/nomicon/subtyping.html> for more information about variance

error: could not compile `playground` (lib) due to 1 previous error

To the extent I understand what this is trying to tell me (which I'm not sure I do) it seems to be backwards. k is a subslice of self.buf and therefore should have lifetime 'a; ss is reference-free; the lifetime of self and therefore self.map should, if anything, be required to be shorter than the lifetime 'a. Shouldn't it?

Code

use std::collections::HashMap;
use std::mem;

#[derive(Copy, Clone, Debug, Default, PartialEq, Eq, Hash)]
pub struct Substr {
    pub start: usize,
    pub end: usize,
}

pub struct Interner<'a> {
    buf: &'a mut String,
    map: HashMap<&'a str, Substr>,
}

impl<'a> Interner<'a> {
    pub fn new(buf: &'a mut String) -> Interner<'a> {
        Self {
            buf,
            map: HashMap::new(),
        }
    }
    
    pub fn into_buffer(self) -> String {
        mem::take(self.buf)
    }
    
    pub fn intern(&mut self, s: &str) -> Substr {
        if let Some(ss) = self.map.get(s) {
            return *ss;
        }

        let start = self.buf.len();
        self.buf.push_str(s);
        let end = self.buf.len();

        let ss = Substr { start, end };
        let k = &self.buf[start..end];

        let old = self.map.insert(k, ss);
        assert_eq!(old, None, "should have returned early with '{}' in map", s);    
        ss
    }
}

(Playground)

self.buf: &'a mut String, but you don’t have access to self (or self.buf) for the full 'a lifetime; you have access to it via a &mut self argument. Rust lets you turn a &'short mut &'long mut T into a &'short mut T, so you only have mutable access to self.buf for that arbitrary '1 lifetime mentioned in the error.

Edit: lmao, I’d written &'mut short &'mut long T.

Taking a step back, the capacity of self.buf may need to increase when you push another string into it. That could reallocate the backing data of the string, which would necessary invalidate previous references to its contents, such as the references you’re trying to store in the hashmap.

Regardless of the specific lifetime problems that arise, what you’re trying to do cannot work with a String as the backing data. Using something like the bumpalo crate could work, though. (I haven’t looked much into string interning, so there might be other high-level improvements possible. In particular, I suspect that you don’t need to record the end index.)

Ugh, yeah, I always forget about that possibility. Thanks for the reminder.

Pinging you since I added more info to my answer just now.

Are you thinking of Rust lifetimes in terms of "from construction to destruction"? Rust lifetimes are more about borrow durations than destruction points (though it is true that being borrowed conflicts with being destructed).

For example, self.buf is an exclusive borrow of some String (*self.buf) of duration 'a, and self.map contains references to str data which is shared-borrowed for 'a. 'a does not represent where the String or where the HashMap get destructed.

From that perspective, the error was trying to say

  • you need to borrow **self.buf for 'a because self.map holds &'a strs
  • but you can't borrow **self.buf for 'a unless you have a &'a mut self (a borrow of Self for duration 'a)[1][2]

The error was not trying to say that self.map has to be destructed after *self.buf.


k isn't allowed to have same borrow duration 'a because it ultimately prevents things like the danging pointer problem @robofinch pointed out. At a more fundamental level, &mut _ are exclusive borrows -- it's UB to be have both an active &_ and &mut _ to the same data for example. That would be trivial to achieve if k was allowed to have the same lifetime through your nested &mut self reference.

The exclusivity of &muts is enough to conclude this general approach can't work without thinking about lifetimes or dangling pointers at all. Just using self.buf asserts it's exclusivity -- which is incompatible with references into the String existing elsewhere, like in self.map.

And that's also related to why &'a mut self isn't an actual solution: the only way it can soundly compile is by making it impossible to use self.buf ever again after creating k with duration 'a.[3]

(If you start asking why this strong exclusivity guarantee exists, you circle back to it preventing things like dangling pointers when push reallocates, etc. For example, you pass &mut String to push, which is again incompatible with references into the String existing elsewhere -- which is good, because if they existed, they might dangle after reallocation.)


  1. "Outlives" is like "greater or equal", not "strictly greater". ↩︎

  2. If you made the method take &'a mut self, it would solve the immediate error, but you don't actually want that either -- things will stop working around where you call the method instead. ↩︎

  3. This also means *self as a whole must be unusable after the method returns. ↩︎