How to improve performance to 2 seconds?

ian · December 13, 2019, 2:23pm

~/hello_cargo$ time cargo build --release
~/hello_cargo$ time ./target/release/hello_cargo 
100

real	0m3.384s
user	0m3.296s
sys	0m0.088s

main.rs code:

use std::collections::HashMap;

fn main() {
    let mut i = 0;
    let mut m = HashMap::new();
    while i < 10000000 {
        let index = i % 1000000;

        let mut key = String::from("a");
        key.push_str(&index.to_string());

        if let Some(value) = m.get(&key) {
            m.insert(key, value + 10);
        } else {
            m.insert(key, 10);
        }

        i += 1;
    }

    match m.get(&String::from("a999999")) {
        Some(&number) => println!("{}", number),
        _ => println!("Don't have number."),
    }
}

main.go code:

package main

import (
	"fmt"
	"strconv"
)

func main() {
	m := make(map[string]int)

	for i := 0; i < 10000000; i++ {
		index := i % 1000000
		m["a"+strconv.Itoa(index)] += 10
	}

	fmt.Println(m["a999999"])
}

~$ time ./main
100

real	0m2.529s
user	0m2.799s
sys	0m0.064s

cliff · December 13, 2019, 2:30pm

Please put backtick fences around your code so it formats properly.

```
fn main() {
    // Code goes here
}
```

ian · December 13, 2019, 2:36pm

Thanks for reminding

cliff · December 13, 2019, 2:38pm

Some opportunities for speeding up your code:

push_str can re-allocate if it exceeds the capacity of the String, so allocate the key up front with the space you need.
index.to_string() allocates, but then you discard the string.
Consider using format! to ensure that only one allocation happens for each key. (This might resolve both of this above issues).
Look into the .entry() interface on HashMap to avoid indexing into the hash map twice.

Hyeonu · December 13, 2019, 2:45pm

Here's the direct translation of your Go code to Rust. The printing code becomes somewhat verbose to match with the Go's map indexing semantics.

You can improve the performance of this code by replacing ::new() with ::with_capacity(1000000), though it may not that significant.

ian · December 13, 2019, 2:53pm

Significant performance improvements, thanks. Is there room for improvement in performance?

~/hello_cargo$ time ./target/release/hello_cargo 
100

real	0m2.609s
user	0m2.565s
sys	0m0.045s

main.rs code:

use std::collections::HashMap;

fn main() {
    let mut m = HashMap::<String, isize>::new();
    
    for i in 0..10000000 {
        let index = i % 1000000;(
        *m.entry(format!("a{}", index)).or_default()) += 10;
    }
    
    println!("{}", m.get("a999999").copied().unwrap_or_default())
}

~/hello_cargo$ time ./target/release/hello_cargo 
100

real	0m2.475s
user	0m2.435s
sys	0m0.040s

use std::collections::HashMap;

fn main() {
    let mut m = HashMap::<String, isize>::with_capacity(1000000);
    
    for i in 0..10000000 {
        let index = i % 1000000;(
        *m.entry(format!("a{}", index)).or_default()) += 10;
    }
    
    println!("{}", m.get("a999999").copied().unwrap_or_default())
}

~$ time ./main
100

real	0m2.177s
user	0m2.442s
sys	0m0.064s

package main

import (
	"fmt"
	"strconv"
)

func main() {
	m := make(map[string]int, 1000000)

	for i := 0; i < 10000000; i++ {
		index := i % 1000000
		m["a"+strconv.Itoa(index)] += 10
	}

	fmt.Println(m["a999999"])
}

cliff · December 13, 2019, 3:06pm

Another observation: Your current code creates each key 10 times. Create each one once, outside the loop, and use those.

cuviper · December 13, 2019, 4:14pm

You might want to try different hashers, as the default secure hasher is not the fastest. The std implementation is built on the hashbrown crate, and if you use that directly you'll get a faster ahash instead. There are more hasher options on crates.io too: fnv, fxhash, xxhash, etc.

dodomorandi · December 13, 2019, 5:12pm

It looks like that Go default hasher uses FNV algorithm. As @cuviper said, different hashing algorithm have different pros and cons.

Probably for the best comparison you should use one of the available implementations.

kornel · December 13, 2019, 5:17pm

use std::fmt::Write;

fn main() {
    // Faster hasher
    // Use Box<str> to avoid storing capacity
    // use a size small enough for the values
    let mut m = fxhash::FxHashMap::<Box<str>, u32>::with_capacity_and_hasher(1000000, Default::default());
    // Reuse one string allocation across loop iterations
    let mut k = String::with_capacity(30);

    for i in 0..10000000 {
        let index = i % 1000000;
        // clear + write! is like an in-place format!
        k.clear();
        let _ = write!(k, "a{}", index);
        // avoids allocating the string 9 out of 10 times
        if let Some(v) = m.get_mut(k.as_str()) {
            *v += 10;
        } else {
            m.insert(k.clone().into_boxed_str(), 10);
        }
    }

    println!("{}", m.get("a999999").copied().unwrap_or_default())
}

drmason13 · December 13, 2019, 10:56pm

I haven't used this (yet) but could be interesting to see how these compare when run through flamegraph: flamegraph/README.md at main · flamegraph-rs/flamegraph · GitHub

cbiffle · December 13, 2019, 11:42pm

FWIW, testing with hyperfine on a Skylake machine, the "boxed str" part produces no statistically significant performance improvement, so I would suggest this simplification:

use std::fmt::Write;

fn main() {
    let mut m = fxhash::FxHashMap::with_capacity_and_hasher(
        1_000_000,
        Default::default(),
    );
    let mut k = String::with_capacity(30);

    for i in 0..10_000_000 {
        k.clear();
        let _ = write!(k, "a{}", i % 1_000_000);

        if let Some(value) = m.get_mut(&k) {
            *value += 10;
        } else {
            m.insert(k.clone(), 10);
        }
    }

    println!("{}", m.get("a999999").unwrap_or(&0));
}

On my machine this is 1.55x faster than the Rust original. (Obviously I can't compare my times with the OPs in an apples-to-apples comparison.)

(Edit: and 1.36x faster than the Go program.)

ian · December 13, 2019, 11:51pm

cool!

~/hello_cargo$ time ./target/release/hello_cargo
100

real 0m1.794s
user 0m1.762s
sys 0m0.032s

ian · December 13, 2019, 11:55pm

Thank you for your reply.

~/hello_cargo$ time ./target/release/hello_cargo
100

real 0m1.828s
user 0m1.788s
sys 0m0.041s

kornel · December 14, 2019, 1:53am

I've tried it on Kaby Lake and saw a significant difference. The working set with Box should be smaller, so whether there's a visible difference it depends on whether the hashmap fits in your cache.

krdln · December 14, 2019, 3:00am

What a nice rabbit hole to fall into!

My first thought was – the data is accessed in ordered way, so let's try (an ordered) BTreeMap instead of hashmap. Turned out it wasn't enough to beat @kornel solution. So I thought – the strings are always small, let's use some kind of small string optimization – and grabbed an arraystring crate. Faster, but still not there.

So for a fun experiment decided to increase the dataset size ten times – BtreeMap solution finished in 46 seconds and hashmap needed 10 minutes. I guess the main conclusion from that is that CPU caches are insanely big these days. But if your Value gets more complex the cache effect would be more visible.

BTreeMap:

use std::collections::BTreeMap;
use std::fmt::Write;

fn main() {
    // Since we're accessing elements in ordered way, let's use an ordered collection.
    let mut m: BTreeMap<Box<str>, u32> = BTreeMap::new();
    // Reuse one string allocation across loop iterations
    let mut k = String::with_capacity(30);

    for i in 0..10000000 {
        let index = i % 1000000;
        // clear + write! is like an in-place format!
        k.clear();
        let _ = write!(k, "a{}", index);
        // avoids allocating the string 9 out of 10 times
        if let Some(v) = m.get_mut(k.as_str()) {
            *v += 10;
        } else {
            m.insert(k.clone().into_boxed_str(), 10);
        }
    }

    println!("{}", m.get("a999999").copied().unwrap_or_default())
}

BTreeMap + arraystring

use std::collections::BTreeMap;
use std::fmt::Write;

use arraystring::{ArrayString, typenum::U15};

fn main() {
    // Since we're accessing elements in ordered way, let's use an ordered collection.
    let mut m: BTreeMap<ArrayString<U15>, u32> = BTreeMap::new();

    for i in 0..10000000 {
        let index = i % 1000000;
        let mut k = ArrayString::new();
        k.clear();
        write!(k, "a{}", index).unwrap();
        *m.entry(k).or_default() += 10;
    }

    println!("{}", m.get("a999999").copied().unwrap_or_default())
}

Then I decided to try rolling my own specialized arraystring. Seems to beat kornel's solution slightly. But hashmap also gets faster with custom arraystring, so that leaves a tie, at least on my CPU. Gist

system · March 13, 2020, 3:00am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to improve rust performance to 1.4 seconds? help	32	1182	September 14, 2023
Guessing game (with a twist): how can I make it faster? tutorials	5	831	January 12, 2023
How can I improve this code help	21	1078	November 25, 2019
How to increase speed for my code? help	2	419	March 17, 2022
How to test speed? help	6	1926	April 20, 2022

How to improve performance to 2 seconds?

Related Topics