How to delete string in String interner

I want to compare string by using string Interner, and I used matklad's version. I think it's great implementation. But I got stucked when I want to delete something:

 interner.remove("hello") // I want to implement this

If I want to implement it, I need to remove the key in the self.map, I can just use self.map.remove(name);, but don't know how to remove the slice in the self.vec, when I remove it, all the item behind of this changes its index, and all of them need to move and re-interner too, what more, I don't know how to delete the corresponding slice in the self.full.

Is there any way to implement it ? Or maybe I'm going in the wrong direction, because it's not meant to delete strings. Any help is welcome!

use std::{collections::HashMap, mem};

fn main() {
    let mut interner = Interner::with_capacity(10);
    let id1 = interner.intern("hello");
    let id2 = interner.intern("hello");
    let s = interner.lookup(id1);
    println!("{}", s);
    assert_eq!(id1, id2);

    // interner.remove("hello") // I want to implement this
    // let id3 = interner.intern("hello");
    // assert_ne!(id1, id3)
}

pub struct Interner {
    map: HashMap<&'static str, u32>,
    vec: Vec<&'static str>,
    buf: String,
    full: Vec<String>,
}
impl Interner {
    pub fn with_capacity(cap: usize) -> Interner {
        let cap = cap.next_power_of_two();
        Interner {
            map: HashMap::default(),
            vec: Vec::new(),
            buf: String::with_capacity(cap),
            full: Vec::new(),
        }
    }
    pub fn intern(&mut self, name: &str) -> u32 {
        if let Some(&id) = self.map.get(name) {
            return id;
        }
        let name = unsafe { self.alloc(name) };
        let id = self.map.len() as u32;
        self.map.insert(name, id);
        self.vec.push(name);
        debug_assert!(self.lookup(id) == name);
        debug_assert!(self.intern(name) == id);
        id
    }
    pub fn delete(&mut self) {
        unimplemented!("I want implement this")
    }
    pub fn lookup(&self, id: u32) -> &str {
        self.vec[id as usize]
    }
    unsafe fn alloc(&mut self, name: &str) -> &'static str {
        let cap = self.buf.capacity();
        if cap < self.buf.len() + name.len() {
            let new_cap = (cap.max(name.len()) + 1).next_power_of_two();
            let new_buf = String::with_capacity(new_cap);
            let old_buf = mem::replace(&mut self.buf, new_buf);
            self.full.push(old_buf);
        }
        let interned = {
            let start = self.buf.len();
            self.buf.push_str(name);
            &self.buf[start..]
        };
        &*(interned as *const str)
    }
}

(Playground)

I believe it's not meant to delete strings. What happens when you delete "hello" and there's some interned "hello" floating else where? What happens when you re-insert "hello" again? You definitely won't get the same interned result.

1 Like

Thank you for your opinion, maybe I should change the implementation idea.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.