Which code is better in a clean and performant rust codebase

In a codebase where we're trying to keep everything performant (even the smallest part) we also want to be readable by a new in our team.

But some days ago, the following code modification was refused. I don't understand why, and I remain without answers. Maybe the Rust Community can help me to understand? =-) I'm curious to know the reactions out of context!

If you are curious to know what the following code is used for, it's very simple, we have to limit the amount of objects in a Map (we use a Map and not a Queue for the fast access performance). The function describe an insertion of bar for each id in bar_ids and we pop those with the less chance to be used when we reach the max_known_bar (basically the oldest).
I changed the names of the originals variable, so the link between bar_ids and bar can be strange out of context. But the idea remains that :wink:

Before:

    pub fn insert_and_tail1(
        &mut self,
        bar_ids: &[BarId],
        val: bool,
        bar: Bar,
        max_known_bar: usize,
    ) {
        for id in bar_id {
            self.known_bars.insert(*id, (val, bar));
        }
        while self.known_bars.len() > max_known_bar {
            // remove the oldest
            let (&h, _) = self
                .known_bars
                .iter()
                .min_by_key(|(h, (_, t))| (*t, *h))
                .unwrap();
            self.known_bars.remove(&h);
        }
    }

After refactoring:

    pub fn insert_and_tail2(
        &mut self,
        bar_ids: &[BarId],
        val: bool,
        bar: Bar,
        max_known_bar: usize,
    ) {
        for id in bar_ids {
            self.known_bars.insert(*id, (val, bar));
        }
        let len = self.known_bars.len();
        if len == max_known_bar + 1 {
            /* Fast case where there is just one to pop */
            self.pop();
            return;
        }
        let overflow = len - max_known_bar;
        if overflow >= len.log(2) as usize {
            self.known_bars
                .iter()
                .map(|(h, (_, t))| (*h, *t))
                .sorted()
                .take(overflow)
                .for_each(|(id, _)| {
                    self.known_bars.remove(&id);
                });
        } else { /* In that case the time complexity in more likelly O(N) */
            while self.known_bars.len() > max_known_bar {
                self.pop()
            }
        }
    }

The second one may be more cryptic and hard to understand. But seems also to be faster with various inputs after some Criterion measures (we have most of the time a length of bar_ids == 1). Now, you know everything.

Which one do you prefer in your codebase? Don't hesitate to say if both are horribly designed or if you find a better way to do that!

It looks like you forgot to tweak your post when you copied it from StackOverflow after they locked it. It’s fine — unstructured code review is fine on the Rust Users Forum, but isn’t usually welcome on there.

But anyway, your first version definitely has O(n2) overhead, where n=known_bars, because the outer loop runs O(n), while the inner loop (implied by min_by_key ) also runs O(n).

Your second version has the fast path special case that uses a sorting algorithm. That sorting algorithm should have O(n*log(n)), as comparison sorts usually do.

I would be wary of anyone telling you to just use the first version, unless you have some plan to ensure that the number of items in your map doesn’t get big. Quadratic algorithmic complexity is no joke.

4 Likes

IMO the first one is easier to understand by far. I too would be wary of changing it into the 2nd implementation.

If the performance difference is significant and important, I would try to at least break it up to the 2 parts and give generally better names.

Well, this is complex.
If you'd look at algorithmic complexity, then the first one is O(N^2lg N). The second one is a bit harder to tell, because we don't know what pop does. If pop is O(1) then it is O(NlgN). So, from this alone it seems the second is an improvement.
If you look at code complexity, then I'd say the second one suffers from premature optimization, which according to Knuth is the "root of all evil". The point being is that it tries lots of micro-optimizations and obfuscates the purpose of the code.
You're using a map to store the data, presumably for fast indexing. So, the easiest way to find the ones to remove would be to do the use the if overflow >= len.log(2) as usize { ... } case always. It is algorithmically fast, easy to understand. Also I think O(NlgN) is also theoretically the fastest you can go, unless you have special assumptions about how the values are compared.

1 Like

While I agree that the second option is rather complex, I very much don't like the quadratic complexity of the first option. I might go for something like this:

use std::collections::BinaryHeap;

pub fn insert_and_tail1(
    &mut self,
    bar_ids: &[BarId],
    val: bool,
    bar: Bar,
    max_known_bar: usize,
) {
    for id in bar_id {
        self.known_bars.insert(*id, (val, bar));
    }

    if self.known_bars.len() > max_known_bar {
        let num_to_remove = self.known_bars.len() - max_known_bar;

        // Find the `num_to_remove` smallest items.
        let mut to_remove = BinaryHeap::with_capacity(1+num_to_remove);
        for (h, (_, t)) in self.known_bars.iter() {
            to_remove.push((*t, *h));

            if to_remove.len() > num_to_remove {
                // Remove the largest item.
                to_remove.pop();
            }
        }

        for (h, _t) in to_remove {
            self.known_bars.remove(&h);
        }
    }
}

The above is O(N log(overflow)) rather than O(N log(N)).

4 Likes

Your data structure should maintain a queue sorted by insertion time. Something like:

for id in bar_ids {
    self.known_bars.insert(*id, (val, time));
    self.bar_ids_by_time.push((time, *id));
    if self.known_bars.len() > max_len {
        let (_, id) = self.bar_ids_by_time.pop().unwrap();
        self.known_bars.remove(id);
    }
}
3 Likes

I think there's a few things here:

  1. "everything performant (even the smallest part)" is a strange goal. Most code has at least some parts where it's not worth bothering -- the normal assumption is that 80% is pretty unimportant, but at least 20% is going to be unimportant. But I'll assume this part is actually relevant.

  2. If this is something that needs to be done commonly, maybe it should hold a different data structure from bar_ids: &[BarId]. For example, if you were allowed to reorder that slice -- which it seems would likely be fine since you apparently can't assume that it's sorted -- then you could use select_n_unstable to get the top or bottom k in O(n) time without any allocations.

  3. If you really want to do this "allocate to find the top k", then use an implementation that someone else wrote that's better than the .sorted().take(overflow), since you don't need the whole thing sorted. Since it seems like you have itertools for .sorted() anyway, why not .k_smallest(overflow)?

2 Likes

You caught me! :blush: thank you for your answer

That's neat!!

I always assumed max_known_bar was a constant, and realized that I never confirmed that.

@adrien-zinger how is that number calculated?

@notriddle you were right, it's a constant!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.