A Question on Performance


#1

While working on the nucleotide count exercise in Exercism, I noticed that there were two types of solutions.

In Option 1, the counting logic lives in the count method and the nucleotide_counts method calls count 4 times, once for each nucleotide.

use std::collections::HashMap;

pub fn count(to_find: char, strand: &str) -> usize {
  strand.chars().filter(|&c| c == to_find).count()
}

pub fn nucleotide_counts(strand: &str) -> HashMap<char, usize> {
  let mut h = HashMap::new();
  for acid in "ATCG".chars() {
    h.insert(acid, count(acid, strand));
  }
  h
}

In Option 2, the counting logic lives in nucleotide_counts and count uses nucleotide_counts to find the count for a single nucleotide. The argument given for this approach is that nucleotide_counts only has to traverse the entire strand once, so it is therefore faster.

use std::collections::HashMap;
use std::collections::hash_map::Entry;

pub fn count(to_find: char, strand: &str) -> usize {
  if let Some(x) = nucleotide_counts(&strand).get(&to_find) {
    *x
  } else {
    0
  }
}

pub fn nucleotide_counts(strand: &str) -> HashMap<char, usize> {
  let mut h: HashMap<char, usize> = "ACGT".chars().map(|c| (c, 0)).collect();
  for acid in strand.chars() {
    match h.entry(acid) {
      Entry::Occupied(entry) => {
        *entry.into_mut() += 1;
      }
      Entry::Vacant(_) => {}
    }
  }
  h
}

But none of the tests really exercise performance. The longest strand provided is 70 characters. So I changed my test suite to test nucleotide_counts using a 10-million character strand.

I then ran the test against both implementations with time cargo test a few times.

Option 1, which should traverse the string 4 times, takes about 3.1-3.3 seconds. Option 2, which should only traverse the string once, takes 7.4-8.0 seconds.

Any ideas why Option 2, which seems more performant, is slower?


#2

cargo test runs a debug build, which means compilation without optimizations turned on.

You can either reformulate the test as a program and use cargo build --release or add this to your Cargo.toml to cause tests to run with optimizations on:

[profile.text]
opt-level = 3

#3

HashMap lookups are not particularly cheap. For a map that only ever has four keys, all of them known beforehand, you shouldn’t use a HashMap in any case.


#4

Thanks for the suggestions, but that’s not the question I’m asking. I know there are faster ways to do this. What I’m wondering is why Option 1 is faster than Option 2. They are both running as tests, so the debug build should affect their performance equally. They are both using HashMaps, so that affects them equally.


#5

That’s not true: in option 1, you’re doing exactly 4 hashmap operations. In option 2, you’re doing 10 million.


#6

Ah, I see your point now.

Interestingly, we’ve also experimented with replacing HashMap with BTreeMap. After doing so the Option 2 approach drops down to about 4.7 seconds, so clearly it helps. But as you say, 4 lookups is going to beat 10 million.

So doing a filter over 10 million characters 4 times (as I do in Option 1) is going to be faster than doing 10 million lookups once. When I think of it that way it makes more sense.


#7

Of course, the fastest thing to do should be to traverse the string once, but only perform 4 map operations: I’d be interested how my implementation compares: http://exercism.io/submissions/3a68dd0d5a5c423f857cc7bde792c550


#8

Is it ok to notice a rater silly thing? If char is one of those letters, then (((char as u8) / 2) & 3) maps ATCG to 0, 2, 1, 3. So the map can be replaced by a 4-value array, indexed by (((c as u8)/2) & 3) as usize.

Also, if the strand is guaranteed to contain only the characters A, T, C, and G, then utf8 decoding should not be needed, and a few milliseconds might be gained by calling strand.bytes() rather than strand.chars().

Example code:


#9

To be fastest, you’d first convert any ATCG strings to a Vec with the acids mapped to 0…3. Then instead of a HashMap, use [usize; 4] as the “map”.


#10

1.05 seconds.