I'm implementing my own regular expression, but I've to store a HashMap
that maps between objects of my own string type:
#[derive(Clone, Debug)]
pub struct RegexMatch {
m_index: i32,
m_captures: HashMap<String, String>,
// ...
}
impl RegexMatch {
pub fn index(&self) -> i32 {
self.m_index
}
pub fn get_capture(&self, name: impl AnyStringType) -> String {
self.m_captures.get(name.convert())
}
// ...
}
The problem here is that HashMap
has Hash
bounds, thus I can't use get
to retrieve capture by name. My string type is implemented using either a Vec<u8>
, Vec<u16>
or Vec<u32>
.
Here's the representation:
#[derive(Clone)]
pub struct String {
m_repr: Arc<StringRepr0>,
}
#[derive(Clone)]
struct StringRepr0 {
m_len: i32,
m_repr: StringRepr1,
}
#[derive(Clone)]
enum StringRepr1 {
Reference(Arc<StringRepr2>),
Slice(Slice),
}
#[derive(Clone)]
struct Slice {
container: Arc<StringRepr2>,
start: usize,
end: usize,
}
#[derive(Clone)]
enum StringRepr2 {
Latin1(Vec<u8>),
Ucs2(Vec<u16>),
Ucs4(Vec<u32>),
}
Although there is string interning, the actual comparison between types and ordering isn't just done by Arc::ptr_eq()
(it can either be a substring (aka. StringRepr1::Slice
) or direct reference), that'd only be possible (I think?) if everytime I construct a string I iterate every character and obtain the hash from Vec<char>
(which is then last converted to either Vec<u8>
, Vec<u16>
or Vec<u32>
, based on largest Unicode ordinal). Python uses a similiar representation (Scalar Values are stored based on largest Unicode ordinal).
I just replaced HashMap
by BTreeMap
for now. If anyone knows of something better, here's the full code: https://github.com/matheusdiasdesouzads/rust-fb/tree/master/src