A Question on Performance

While working on the nucleotide count exercise in Exercism, I noticed that there were two types of solutions.

In Option 1, the counting logic lives in the count method and the nucleotide_counts method calls count 4 times, once for each nucleotide.

use std::collections::HashMap;

pub fn count(to_find: char, strand: &str) -> usize {
  strand.chars().filter(|&c| c == to_find).count()
}

pub fn nucleotide_counts(strand: &str) -> HashMap<char, usize> {
  let mut h = HashMap::new();
  for acid in "ATCG".chars() {
    h.insert(acid, count(acid, strand));
  }
  h
}

In Option 2, the counting logic lives in nucleotide_counts and count uses nucleotide_counts to find the count for a single nucleotide. The argument given for this approach is that nucleotide_counts only has to traverse the entire strand once, so it is therefore faster.

use std::collections::HashMap;
use std::collections::hash_map::Entry;

pub fn count(to_find: char, strand: &str) -> usize {
  if let Some(x) = nucleotide_counts(&strand).get(&to_find) {
    *x
  } else {
    0
  }
}

pub fn nucleotide_counts(strand: &str) -> HashMap<char, usize> {
  let mut h: HashMap<char, usize> = "ACGT".chars().map(|c| (c, 0)).collect();
  for acid in strand.chars() {
    match h.entry(acid) {
      Entry::Occupied(entry) => {
        *entry.into_mut() += 1;
      }
      Entry::Vacant(_) => {}
    }
  }
  h
}

But none of the tests really exercise performance. The longest strand provided is 70 characters. So I changed my test suite to test nucleotide_counts using a 10-million character strand.

I then ran the test against both implementations with time cargo test a few times.

Option 1, which should traverse the string 4 times, takes about 3.1-3.3 seconds. Option 2, which should only traverse the string once, takes 7.4-8.0 seconds.

Any ideas why Option 2, which seems more performant, is slower?

cargo test runs a debug build, which means compilation without optimizations turned on.

You can either reformulate the test as a program and use cargo build --release or add this to your Cargo.toml to cause tests to run with optimizations on:

[profile.text]
opt-level = 3
1 Like

HashMap lookups are not particularly cheap. For a map that only ever has four keys, all of them known beforehand, you shouldn't use a HashMap in any case.

Thanks for the suggestions, but that's not the question I'm asking. I know there are faster ways to do this. What I'm wondering is why Option 1 is faster than Option 2. They are both running as tests, so the debug build should affect their performance equally. They are both using HashMaps, so that affects them equally.

That's not true: in option 1, you're doing exactly 4 hashmap operations. In option 2, you're doing 10 million.

1 Like

Ah, I see your point now.

Interestingly, we've also experimented with replacing HashMap with BTreeMap. After doing so the Option 2 approach drops down to about 4.7 seconds, so clearly it helps. But as you say, 4 lookups is going to beat 10 million.

So doing a filter over 10 million characters 4 times (as I do in Option 1) is going to be faster than doing 10 million lookups once. When I think of it that way it makes more sense.

1 Like

Of course, the fastest thing to do should be to traverse the string once, but only perform 4 map operations: I'd be interested how my implementation compares: http://exercism.io/submissions/3a68dd0d5a5c423f857cc7bde792c550

Is it ok to notice a rater silly thing? If char is one of those letters, then (((char as u8) / 2) & 3) maps ATCG to 0, 2, 1, 3. So the map can be replaced by a 4-value array, indexed by (((c as u8)/2) & 3) as usize.

Also, if the strand is guaranteed to contain only the characters A, T, C, and G, then utf8 decoding should not be needed, and a few milliseconds might be gained by calling strand.bytes() rather than strand.chars().

Example code:

1 Like

To be fastest, you'd first convert any ATCG strings to a Vec with the acids mapped to 0..3. Then instead of a HashMap, use [usize; 4] as the "map".

1 Like

1.05 seconds.