Mapping HashMap values to a different type

I feel a bit stupid asking this, because I'm sure it's been asked dozens of times already, but somehow I can't find the right search terms.

Say I have a HashMap<K, V> and I want to consume it and turn it into a HashMap<K, W> by applying fn f(v: V) -> W to each value. If V = W, I can just do

fn convert(mut map: HashMap<K, V>) -> HashMap<K, W> {
    for val in map.values_mut() {
        *val = f(*val);
    }
    map
}

Now assume V ≠ W. Can I do this more efficiently than this?

fn convert(map: HashMap<K, V>) -> HashMap<K, W> {
    map.into_iter().map(|(k, v)| (k, f(v))).collect()
}

Basically, the question is whether it's possible to reuse the internal hash table of the original map instead of rebuilding it, since the keys have not changed.

Playground

What makes you think that the map can be reused just because the keys are the same? The values are stored in the map as well. Unless the new values are exactly the same size as the old ones, then this is trivially impossible (yes, even if the new type is smaller – the pointer arithmetic wouldn't add up).

1 Like

There are many designs for hash tables, but a simple one for example would be an array where the element at index n is a linked list of pairs (key, value) holding all associations for keys whose hash modulo the array size is n. If you want to transform the values, you need to rebuild the linked lists, but you don't need to rebuild the array itself. In particular, you don't need to recompute the hash of each key.

I believe it's possible with the raw table and/or raw entry api of the hashbrown crate (sorry for not confirming or providing an example).

There's no way with the map in std.

1 Like

I wonder if in the case where the size is the same if you could get away with an unsafe transmute. (yes, I know this would be dumb and probably no guarantee it would continue to work)

If your goal is simply to avoid recomputing the hashes, then that's definitely possible technically. I don't think the std map provides anything to that end, though. In particular, I don't think you can get the pre-computed hash for any key (I don't know wheter it's stored at all, but for sure there aren't getters).

What you're describing is commonly known as a separate chaining design, however Rust's HashMap (currently) uses open addressing instead, where there is a single array storing metadata, keys and values.

That said even with open addressing you could reuse the metadata and avoid rehashing the keys, but this is a highly specialized operation which is not implemented right now (neither in the stdlib nor in the hashbrown crate, which is used for implementing the stdlib HashMap)

1 Like

Thanks everyone! I don't actually have a pressing need for this where performance really matters, so I'll just rebuild my hash map. I just had the impression I had somehow missed an std API for this, but you're probably right that this use case isn't all that common.

If you have a critical need for this optimization, you could implement it yourself for your specific types by defining a union,

union VOrW {
    v: V,
    w: W,
}

which is used as HashMap<K, VOrW> in a wrapper type that casts &VOrWs to &V or &W, and so on, as appropriate for the current static type.

All this is unsafe, of course, but sound if written correctly, and independent of the implementation details of the HashMap.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.