Performance question related to slices

I have about a million ArrayVec<u8; 1024> stored in the leaves of b-tree. I also have a cursor for traversing this tree. When the cursor traverses the tree it maintains a stack for ancestors of current leaf and a summary of attributes related to tree content.

When performance testing it takes about 16ms to visit all leaves using the Cursor.

To my surprise if I add one more feature... (calling ArrayVec.as_slice() for each visited leaf the performance balloons to 40ms. Is that expected?

It seems to be a one time cost... if I call as_slice three times for each leave it still takes about 40ms. Also if I replace ArrayVec with normal Vec I see similar behavior.

My question: Why does as_slice have this cost? It seems out of proportion to the other work that I'm doing to traverse the tree.

ArrayVec::as_slice does nothing except call ArrayVec::deref, which in turn does some trivial access of the two ArrayVec fields and calls slice::from_raw_parts on the result. This should indeed be about the cheapest possible thing you can do with an ArrayVec.

My guess is that actually accessing the ArrayVec's fields causes a memory access which is always going to be a cache miss, because each of your ArrayVec is larger than a cache line. So you end up generating a lot more memory traffic than you would if you simply traverse the tree but don't access its contents.

6 Likes

Thanks yet again.

It's good to know that as_slice is really as simple and fast as I thought it was. For this test I wasn't actually keeping the slice around, so I know the problem isn't that I'm accessing those elements... but that suggesting is leading me to the problem. I'll report back tomorrow once I'm more awake and have figured issue out a bit more.

Thanks,
Jesse