Ideas for destructuring an Option in perf-critical code

Hello,

I have some code that is extremely perf-critical, and I'm looking for ideas about how to stay on the fast path.

A: This runs twice as fast as B:

struct Node<T> {
    stuff: Vec<Option<T>>
}

impl<T> Node<T> {
    fn get_stuff(&self) -> &Option<T> {
        return self.stuff.last().unwrap()
    }
}

//Caller; get_stuff method is dispatched through some indirection
    ...
    let opt = node.get_stuff();
    match opt {
        ...
    }

B: This runs half as fast as A:

//Common parts shared with above

impl<T> Node<T> {
    fn get_stuff(&self) -> Option<&T> {
        return self.stuff.last().unwrap().as_ref()
    }
}

//Common parts shared with above

When I say there is a 2x difference, I don't just mean in this call. I mean in the program runtime overall. This is a massive difference.

I'm pretty sure, in the slow case, get_stuff hits a pipeline stall, waiting to return until the contents of the Option can be fetched, to decide if its None, before allowing the return to proceed.

In the fast case, I'm guessing the CPU is probably able to start the fetch of the Option, while the return is happening in parallel, and finally when the calling code is ready to match on the option, it's already in cache.

My problem is that I can't just return a reference to the Option because Node is an abstract interface, and for some Node implementations the contents aren't stored as Option<T>

I understand that it sounds like some of my stated goals are in conflict. So I'm just asking how other folks would tackle this particular bear.

Thank you for any thoughts.

Follow-up Note: I considered replacing Option with my own (&valid_bool, &T_union) pair, and then returning the references to each component separately. Which does stay on the fast path. But I wanted to see if there is a way to keep the notch (Null ptr) optimization from Option.

How did you measure this? How can we reproduce your results?

Did you try #[inline(always)] on get_stuff?

EDIT:
Oh I just read the //get_stuff method is dispatched through some indirection comment. Is this dynamic dispatch?

1 Like

Inlining does help. However, get_stuff is called through a jump table. Was &dyn Trait (60% perf loss) but that was costing too much perf, and matching on an enum and calling the concrete impl was acceptable. (only about 5% perf loss)

So for the case where I have only one or two implementations of Node, inlining helps a lot. But then it's a double-edged sword as I add more implementations.

My hope is there is a way I can destructure the option at the other end of the reference without fetching it (and without committing horrible sins against Rust)

If you want to return an Option<&T> you need to access the memory before returning as you have already identified. You could change the return type to a custom enum

enum MyOpt<'a, T> {
    ByRef(&'a Option<T>),
    Some(&'a T),
    None,
}

which sould give you good performance if the branches are branch predictor friendly.

Can you encapsulate the dispatch, i.e make the caller generic? This will blow up the binary size but is probably worth it if this part is limiting. Do you switch the node kind during execution?

1 Like

Unfortunately it is dynamically switching over node types. It's a heterogeneous tree. Otherwise monomorphization would be a clear win.

I like your idea to return a double-nested enum (Option being the inner enum). I'll see if that helps.

Thanks!

UPDATE: Rats. I implemented your suggestion, and it seems like it should work. But it falls off the fast path. Not quite sure why yet. But it's the same speed as using as_ref Sorry I don't have a version of the code simplified down to a single page yet.

The reason I was falling off the fast path when I implemented your suggestion was because of Ptr::byte_offset alternative that doesn't generate load immediate instruction

So I'm going to mark your answer as the solution.

Thanks again!

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.