How to improve the performance when using HashMap with huge data in rust?

selvavm · June 15, 2020, 6:50am

I have been trying to validate Rust for one of my project for past 1 month. So far, it is promising except this one instance.

I am trying to process a huge data. I have 5000 timeseries read from 5000 files and want to process them. I am using hashmap as collection and iterating over it. Due to the nature of my code, I am not able to use par_iter to improve the performance. It takes around 15 mins for the below logic to complete. Any suggestion on how to improve the performance? I am not sure if Hashmap has some bottleneck in processing huge data? Also, should I have to use NDArray1 rather than Vec<f32> for Ts?

use std::collections::HashMap;

type Ts = Vec<f32>;
type FileName = String;
type AllTs = HashMap<FileName, Ts>;

fn main() {
    // Below function reads from a file and returns an Hashmap of len 5000 (That is 5000 Files)
    // Each Ts length is 30000
    // Ts value ranges from 0 - 10
    // I am using HashMap because in someother location of my code I filter the Ts based on filename
    let all_ts: AllTs = load_ts_from_files(); 
    let histogram_bin: Vec<f32> = calculate_bins(all_ts);
    
}

fn load_ts_from_files() -> AllTs {
    // This function reads 5000 files and load the AllTs
    return AllTs::new();
}

fn calculate_bins(all_ts: AllTs) -> Vec<f32> {
    let mut bins = vec![0f32; 10];
    all_ts.iter().for_each(|(file_name, ts)| { // This loops through 5000 times
        for d in ts { //This loops through 30000 times
            let bin = (d/10 as f32) as usize;
            bins[bin] += 1.0;
        }
    });
    return bins;
}

Playground

s3bk · June 15, 2020, 7:11am

selvavm:

    let mut bins = vec![0f32; 10];
    all_ts.iter().for_each(|(file_name, ts)| { // This loops through 5000 times
        for d in ts { //This loops through 30000 times
            let bin = (d/10 as f32) as usize;
            bins[bin] += 1.0;
        }
    });

You can use integers here:

    let scale = 1.0; // adjust this
    let mut bins = vec![0u32; 11]; // adding one extra bin for overflows
    all_ts.iter().for_each(|(file_name, ts)| { // This loops through 5000 times
        for d in ts { //This loops through 30000 times
            let bin = ((d * scale) as usize).min(10); // now bin is at most 10
            bins[bin] += 1;
        }
    });

And don't forget to compile in --release mode.

selvavm · June 15, 2020, 7:36am

Thank you. I missed one detail. My bins can have f32. The calculate function can have the value to be added. Sometimes, It might be += 1.5 or +=1.1. I gave a minimal code here

s3bk · June 15, 2020, 7:41am

Ah. f32 should be fine too.

Other things to look out for:

Read files with BufReader or mmap
preallocate your Vecs and HashMap if you know (roughly) their final size by using the with_capacity method. Growing them is surprisingly expensive.

alice · June 15, 2020, 7:55am

Do you know if reading the files or processing them is slower? How are you reading them?

selvavm · June 15, 2020, 8:10am

I found that processing is slower. Reading also takes time, but I am not interested in optimising it. Just now, I noticed if I comment the for_loop in my calculate_bins, the processing time is same. I think for_each is the bottleneck. I am changing my to see if converting HashMap to `Vec<(String, Vec)> will help.

alice · June 15, 2020, 8:14am

You removed the body of the loop, and it still took the same time???

selvavm · June 15, 2020, 8:17am

Both. I removed the for_loop and also I removed the body of the for_loop. I think something is happening because of rayon multi-threading
I wanted to keep the example minimal, so didn't give the entire context. The above mentioned main is actually called folder() and I do rayon par_iter() on each folder

s3bk · June 15, 2020, 8:32am

I would recommend not using rayon until your code works well. Then you can add it back in to gain extra performance.
Not all problems benefit from it.
And profiling is key! Figure out what part is actually slow.

selvavm · June 15, 2020, 11:39am

I tried profiling by adding some print statements. The issue is not with the calculate_bins(). The issue is with folders par_iter(). My code is something like below,

folders.par_iter().for_each(|folder| { //folder is of type Arc<Mutex<T>>
    let a = std::time::SystemTime::now();
    calculate_bins(&folder.AllTs);
    match a.elapsed() {
        Ok(elapsed) => println!("{} Time took for calculate_bins {} secs", std::thread::current().id(), elapsed.as_secs()),
        Err(e) => println!("Error: {:?}", e)
    }
})

The output is

ThreadId(2) - Time took for calculate_bins 0 secs
ThreadId(3) - Time took for calculate_bins 0 secs
ThreadId(2) - Time took for calculate_bins 0 secs
ThreadId(3) - Time took for calculate_bins 0 secs
....... (for 300 times)

However, after printing each line, it takes some time (couple of secs) to print next line. So I think rayon par_iter is doing something which is time consuming. Also, I noticed that it spawns only 2 threads

s3bk · June 15, 2020, 11:51am

@selvavm I suspect reading the data from disk is slow.

selvavm · June 15, 2020, 11:52am

But I am not reading at this point. The data is already read and stored in Hashmap in load_ts_from_files()

Here I am only processing the data

s3bk · June 15, 2020, 11:53am

what happens without rayon?

bjorn3 · June 15, 2020, 11:59am

You should use std::time::Instant instead. SystemTime can go backwards when you change the clock on your system. Instant is monotonic.

selvavm · June 15, 2020, 2:43pm

I figured what causes the bottleneck. After the calculate_bins(), I dont need the AllTs and thought of releasing it. Since it is part of a struct, I did,

folder.allTs = AllTs::new() //HashMap new

This is taking time which is interesting

cuviper · June 15, 2020, 3:14pm

You could try mem::replace to swap that new value in, then rayon::spawn to drop the old value separately, out of sight, out of mind. Of course, if you queue up enough of those spawns it may still block other useful work on the threadpool. You could also create a separate pool for this purpose, perhaps with just a few threads.

FWIW, parallel iterators on the standard HashMap are a bit of a kludge, because rayon can't access the internals of the type to split in parallel. We actually end up collecting it to a Vec and then parallel iterating that. But at least for par_iter() that only means collecting references, so only the number of items has any effect, not the data size.

If you use the hashbrown crate directly (which std is based on), then it has a rayon feature for native parallel iterators. The indexmap crate is another alternative with a rayon feature.

CreepySkeleton · June 15, 2020, 4:04pm

a.old_map = HashMap::new() is rather ineffective way of clearing a collection.because it drops the old map, thus deallocating the old container (often quite expensive).

This also has unfortunate side effect: new insertions will going to have to reallocate the memory which is also very expensive.

The clear method is often a better way.

I tried profiling by adding some print statements.

This isn't profiling at all and oftentimes spectacularly bad replacement for it. I recommend you using a real profiler; perf is a good way to start if you're on linux.

https://athemathmo.github.io/2016/09/14/tools-for-profiling-rust.html

cuviper · June 15, 2020, 4:17pm

Maybe they do want to release that memory, if it really won't be used in the latter phase of the program.

I wouldn't expect the top-level deallocation to be a big deal anyway. Dropping a bunch of indirect allocations for all of the items could take a while, but that will happen with clear too.

system · September 13, 2020, 4:17pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
HashMap performance	26	24426	January 22, 2020
Iterating hashmap vs iterating vec	2	1142	July 15, 2023
Performance issue - High complexity code help	17	1491	July 1, 2020
Rust optimisation help	16	2592	January 12, 2023
Iterating through HashMap approx twice the time of Vec?	10	856	August 21, 2022

How to improve the performance when using HashMap with huge data in rust?

Related Topics