(idle musing) micro micro micro performance question

Hi! Inside a really tight loop I have a check that either always needs to be done or never needs to be done. That Checker is essentially a callback.

My question is, what's the most performant way to handle this, considering it's always provided or not, and that doesn't change at runtime.

I can think of three ways, but I don't have any factual knowledge about this stuff (only faulty assumptions):

  • just put if let Some(..) inside the loop and trust the compiler
  • capture is_some() outside the loop and then check that in the loop before unwrapping the checker (boolean is faster than if let Some(...)`?
  • take inspiration from the NullObjects pattern of yonder year and have a ConstantChecker which is used if one isn't provided

I've put these three into code (Rust Playground). What other options are there, and which would you choose?

Thanks!

/// The callback
trait ICheck {
    fn do_check(&self) -> bool;
}

struct RealWorldChecker {}

impl ICheck for RealWorldChecker {
    fn do_check(&self) -> bool {
        // real logic
        true
    }
}

/// The production code that uses the checker
struct Worker<IC: ICheck> {
    checker: Option<ICheck>,
}
impl Worker {
    /// idiomatic, but maybe naive?
    fn do_work_one(&self) {
        for i in 0..10 {
            if let Some(c) = &self.checker {
                let _ = (c);
            }
        }
    }

    // I assume a boolean check is more performant than `if let Some(..)`
    fn do_work_two(&self) {
        let is_has_check = self.checker.is_some();
        for i in 0..10 {
            if is_has_check {
                let _ = (&self.checker.unwrap());
            }
        }
    }

    // is calling an inlineable fn the most efficient? No idea.
    fn do_work_three(&self) {
        let checker = &self.checker.unwrap_or_else(|| ConstantChecker { v: true });
        for i in 0..10 {
            let _ = (checker);
        }

        struct ConstantChecker {
            v: bool,
        }

        impl ICheck for ConstantChecker {
            fn do_check(&self) -> bool {
                self.v
            }
        }
    }
}

obligatory "is your tight loop actual work really so short, that this might matter" question... :slight_smile:

3 Likes

No no no no, this is horribly wrong. How do you think if let Some(...) is implemented? The compiler simply has to emit code that checks some value (the enum tag) and compares it to the desired value, in exactly the same way as it would deal with a boolean. Checking the is_some() and then unwrapping only does one thing: removes type safety. (It can actually make performance very slightly worse in case the compiler misses the optimization and duplicates the check in the unwrap().)

Yeah, do this, and if you are really worried, check the generated assembly. No amount of speculation will make up for actually reading the emitted code.

4 Likes

thanks @RustyJoeM and @H2CO3!

Glad i could troll help :angel:

Joking aside, your topic label was clear and making my "question" redundant, but imho worth stating it explicitly for newcomers adressing similar point later...

(and imho you made very good question as an example for "perf oriented" people coming from lower level languages)

1 Like

If you're curious about reading about this, the optimization you're looking for is called Loop-invariant code motion - Wikipedia

If it's truly invariant, LLVM is generally good at this, so you're probably fine just not thinking about it.

(If it's not actually invariant for some reason, like it changes in the first or last iteration, then the compiler is less good at fixing it. But that's not the situation you said you're in here.)

1 Like

If you're using a type parameter (not dyn) I think this is the best option though. Especially considering that whether or not it's provided doesn't change at runtime. That way if you provide it with an implementation that's vacuous it just compile into nothing, not even a boolean check.

I further think this is a good option because it avoids a certain awkwardness: if your type is Option<T>, even if you only ever pass it None, you still need to have a specific type for T, and have it satisfy whatever type bounds there are on T. Furthermore, the fact that you only ever pass it None in a certain case causes type-inference of what T is to fail (because you never pass it an instance of T so it has no ability to infer what T should be). This would mean you'd have to explicitly inform the compiler what T is, and perhaps create some sort of vacuous place-holder type to implement your trait even though it will never be instantiated or used, which is awkward.

For example, if you try to compile this code, it won't compile because it doesn't know what F is when main calls foo:

fn foo<F: FnOnce()>(o: Option<F>) {
    if let Some(f) = o {
        f();
    }
}

fn main() {
    foo(None);
}

A pattern somewhat like this should work well:

fn foo<C: Fn() -> bool>(checker: C) { ... }

And if you want to make it easier to call it without a check, add a helper function:

fn foo_no_checker() {
    foo(|| true);
}

Or if you really want to make your own Checker trait rather than use a closure:

trait Checker {
    fn check(&self) -> bool;
}

fn foo<C: Checker>(checker: C) { ... }

struct NoCheck;

impl Checker for NoCheck {
    fn check(&self) -> bool { true }
}

fn foo_no_checker() {
    foo(NoCheck);
}

Adapting this to be stored in a struct should be a straightforward change.

2 Likes