Inverted index from HashMap<String, usize>

Suppose you have a HashMap<String, usize> where you know by construction that the values are unique integers in the range 0..n where n is the number of entries in the map. What is the most efficient way to invert this mapping, producing Box<[String]> where index i in the slice is the string that was mapped to value i by the original HashMap? You may consume the HashMap. What I have now is

fn invert_string_index(
    mut m: HashMap<String, usize>
) -> Box<[String]> {
    let mut v = m.drain().map(|(k, v)| (v, k)).collect::<Vec<_>>();
    v.sort_unstable();  // unstable sort is OK because indices are unique
    v.into_iter().map(|(_, s)| s).collect::<Vec<_>>().into_boxed_slice()
}

but this involves an intermediate vector that it feels like it ought to be possible to eliminate. The only alternative I can think of is to keep the original map around and use sort_unstable_by_key but I suspect that would actually be slower.

That should allow you to construct the result without any sorting, shouldn't it?

fn invert_string_index(
    m: HashMap<String, usize>
) -> Box<[String]> {
    let mut res = vec![String::new(); m.len()];
    
    for (s, i) in m {
        res[i] = s;
    }
    
    res.into_boxed_slice()
}

For a particularly hot path, you can also bypass the empty fill completely. Under the hood, the into_boxed_slice() relies on shrink_to_fit(), which is itself reallocation agnostic:

fn invert_string_index(m: HashMap<String, usize>) -> Box<[String]> {
    let mut uninit = Box::new_uninit_slice(m.len());
    for (s, i) in m {
        uninit[i].write(s);
    }
    unsafe { uninit.assume_init() }
}

Although if you must do so, at the very least consider:

fn invert_string_index(m: HashMap<String, usize>) -> Box<[String]> {
    let len = m.len();
    let mut check_left = len;
    let mut uninit = Box::new_uninit_slice(len);

    for (s, i) in m {
        uninit[i].write(s);
        #[cfg(debug_assertions)] 
        {
            check_left -= 1;
        }
    }
    // if the slice was, indeed, filled end-to-end
    debug_assert_eq!(check_left, 0);
    // SAFETY: then assuming it is initialized
    // is, indeed, safe; and is never UB
    unsafe { uninit.assume_init() }
}

(both functions should be unsafe because they rely on input precondition, that all indices are distinct; on the other hand, unchecked indexing should then be used too)

i am a bit confused why you'd need the hashmap to begin with, if you have one exact set of contiguos numeric keys wouldn't you be able to just start directly with the array?
i guess it might be a matter of ergonomics because you have some odd initialization order but if that's the case i don't think the conversion to an array at the end is that significant for performance

This is from a two-phase algorithm. The first phase needs random access string -> id-number queries but not id-number -> string queries; in the second phase it's just the opposite.

You're right that I could be building up both indexes at the same time, though.

If you can control the first phase, then an IndexSet<String> gives you this naturally.

I don't see IndexSet in the stdlib docs, which crate are you thinking of?

Sorry, I mean from the indexmap crate: