Semantic of `Map/Set::from_iter` with duplicated keys

I couldn't find much documentation on HashMap::from_iter (or BTreeMap, for that matter). What is the semantic around duplicates keys?

Based on my testing (code below), it seems that HashMap uses the same semantic as applying insert from the front, i.e., use the first duplicated keys and the last duplicated values. Is this a guaranteed behavior? BTreeMap seems to have a different semantic where the last duplicated keys are used instead.

let keys = [0, 1, 2, 3, 1, 2];
let values = [0, 1, 2, 3, 4, 5];
let hash_map = std::collections::HashMap::<_, _>::from_iter(std::iter::zip(&keys, values));
let mut used_keys_idx = Vec::from_iter(
    hash_map
        .keys()
        // SAFETY: well-aligned from the same allocated object (`keys`)
        .map(|k| unsafe { std::ptr::from_ref(*k).offset_from(keys.as_ptr()) }),
);
let mut used_values = Vec::from_iter(hash_map.values());
used_keys_idx.sort();
used_values.sort();
println!("{used_keys_idx:?} {used_values:?}"); // [0, 1, 2, 3] [0, 3, 4, 5]

HashMap::from_iter uses hashbrown::HashMap::extend under the hood, which does replace the old value for an existing key with the new value. If you look at the code of BTreeMap::from_iter, you'll see that the iterator is first collected into a vector, which is then stably sorted by key, before creating the BTreeMap from it. Given that the docs for both FromIterator implementations only state that:

If the iterator produces any pairs with equal keys, all but one of the corresponding values will be dropped.

I wouldn't consider their current behaviour as API-stable and treat it as an implementation detail.

4 Likes

Indeed, I wrote that text and I can tell you that the omission of ordering was on purpose.

2 Likes