Implementing Hash for a Vec based interned string type

I'm implementing my own regular expression, but I've to store a HashMap that maps between objects of my own string type:

#[derive(Clone, Debug)]
pub struct RegexMatch {
    m_index: i32,
    m_captures: HashMap<String, String>,
    // ...
}

impl RegexMatch {
    pub fn index(&self) -> i32 {
        self.m_index
    }

    pub fn get_capture(&self, name: impl AnyStringType) -> String {
        self.m_captures.get(name.convert())
    }

    // ...
}

The problem here is that HashMap has Hash bounds, thus I can't use get to retrieve capture by name. My string type is implemented using either a Vec<u8>, Vec<u16> or Vec<u32>.

Here's the representation:

#[derive(Clone)]
pub struct String {
    m_repr: Arc<StringRepr0>,
}

#[derive(Clone)]
struct StringRepr0 {
    m_len: i32,
    m_repr: StringRepr1,
}

#[derive(Clone)]
enum StringRepr1 {
    Reference(Arc<StringRepr2>),
    Slice(Slice),
}

#[derive(Clone)]
struct Slice {
    container: Arc<StringRepr2>,
    start: usize,
    end: usize,
}

#[derive(Clone)]
enum StringRepr2 {
    Latin1(Vec<u8>),
    Ucs2(Vec<u16>),
    Ucs4(Vec<u32>),
}

Although there is string interning, the actual comparison between types and ordering isn't just done by Arc::ptr_eq() (it can either be a substring (aka. StringRepr1::Slice) or direct reference), that'd only be possible (I think?) if everytime I construct a string I iterate every character and obtain the hash from Vec<char> (which is then last converted to either Vec<u8>, Vec<u16> or Vec<u32>, based on largest Unicode ordinal). Python uses a similiar representation (Scalar Values are stored based on largest Unicode ordinal).

I just replaced HashMap by BTreeMap for now. If anyone knows of something better, here's the full code: https://github.com/matheusdiasdesouzads/rust-fb/tree/master/src

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.