 # Sorting hashmap of vectors by sum, and some more

Hey!
I'm working on a project where I consolidate some data as `Hashmap<String, Vec<f32>>`. All of the vectors have the same number of elements in them. I need to sort this hashmap, based on sum of the elements in the vector, and later sum of elements of the vector excluding one of the elements iteratively.

``````trg_map.retain(|_, v| v.iter().filter(|&x| x > &sim_cut).count() > 4); // filter hashmap

for (_, val) in trg_map.iter_mut() {
let sum = val.iter().sum();
val.push(sum);
for lineage in 0..lineage_count {
val.push(sum-val[lineage]);
}
}
let mut map: Vec<(&String, &Vec<f32>)> = trg_map.iter().collect();
map.sort_by(|a,b| a.1[lineage_count].partial_cmp(&b.1[lineage_count]).unwrap());
// output top n entries
// Similarly sort based on sum calculated in the following entries to the vector.
``````

The first idea that struck me was to add these sums to the vector itself, and then sort based on that value, but I was wondering if there any better solutions to carrying this out? In specific,

• I was wondering if there is an idiomatic way to sort without adding those values to the vector
• Is it possible to refer a particular element in the vector as a part of the closure? i.e. aiding in calculation of sum without the whole `for` loop.

Thank you,
aa

Any time you want to sort by something that's non-trivial to calculate, the default answer is https://doc.rust-lang.org/std/primitive.slice.html#method.sort_by_cached_key.

That's a bit harder here, since `f32: !Ord`, but you can solve that with something like Noisy_float — Rust math library // Lib.rs :

``````.sort_by_cached_key(|p| NoisyFloat::from(p.1.sum::<f32>()))
``````
4 Likes

Thanks a lot , I hadn't looked at `sort_by_cached_key` in this context earlier, I'll keep this in mind! But I should be using `sort_by` in my case, since I need to calculate the sum of vectors to get the sum of vectors when I drop out one element at a time, right?

Sometimes it is easier to use a structure. Maybe it does not fully answer to your problem, but it might help.

``````use std::collections::HashMap;

// We need a lifetime for this struct because it holds some references.
// The lefetime of the struct can not be greater than the lifetime of the map
struct MapSum<'map> {
// You should prefer &str over &String
key: &'map str,
// Same for &Vec<_> and &[_]
val: &'map [f32],
// the current value of the sum
sum: f32,
// index to the next val to exclude
cursor: usize
}

impl<'map> MapSum<'map> {
pub fn new<'a>(key: &'a str, val: &'a [f32]) -> MapSum<'a> {
let sum = val.iter().sum();
MapSum{key, val, sum, cursor: 0usize}
}

pub fn has_next(&self) -> bool {
self.cursor < self.val.len()
}

// Here we update the sum by retrieving to the sum the val at the pos 'self.cursor'
// Should not be called more than self.val.len()
pub fn next(&mut self) {
assert!(self.has_next());
self.sum -= self.val[self.cursor];
self.cursor += 1;
// If you only need to remove one element at a time
/*
if self.cursor == 0 {
self.sum -= self.val[self.cursor];
} else {
self.sum += self.val[self.cursor - 1];
self.sum -= self.val[self.cursor];
}
self.cursor += 1;
*/
}
}

pub fn sort_map(trg_map: &mut HashMap<String,Vec<f32>>) {
//trg_map.retain(|_, v| v.iter().filter(|&x| x > &sim_cut).count() > 4); // filter hashmap
let mut map_sum: Vec<_> = trg_map.iter()
.map(|(key, val)| MapSum::new(key, &val[..]))
.collect();

// now we can sort by accessing the sum field of the structs
map_sum.sort_by(|a,b| a.sum.partial_cmp(&b.sum).unwrap() );

// we update just by calling next
map_sum.iter_mut().for_each(|m| m.next());
// sort again
map_sum.sort_by(|a,b| a.sum.partial_cmp(&b.sum).unwrap() );
}
``````

Here a link to a blog post that explain the differences between `String`, `&str`. The relation is similar for `Vec<_>` and `&[_]` (slice).

2 Likes

Hey Vincent, apologies for the delayed response. This approach is interesting, so you're suggesting I convert the hashmap to a struct after I complete all my operations with the data. Interesting, I'll give it a shot.

I was also facing issues with the values being f32::NaN, and I keep filtering them in place. I guess, using this struct, I could filter them at the initialisation stage, wonderful! Thanks a lot for taking your time for this detailed response .

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.