Performance question related to slices

jessegrosjean · January 7, 2021, 12:39am

I have about a million ArrayVec<u8; 1024> stored in the leaves of b-tree. I also have a cursor for traversing this tree. When the cursor traverses the tree it maintains a stack for ancestors of current leaf and a summary of attributes related to tree content.

When performance testing it takes about 16ms to visit all leaves using the Cursor.

To my surprise if I add one more feature... (calling ArrayVec.as_slice() for each visited leaf the performance balloons to 40ms. Is that expected?

It seems to be a one time cost... if I call as_slice three times for each leave it still takes about 40ms. Also if I replace ArrayVec with normal Vec I see similar behavior.

My question: Why does as_slice have this cost? It seems out of proportion to the other work that I'm doing to traverse the tree.

mbrubeck · January 7, 2021, 1:03am

ArrayVec::as_slice does nothing except call ArrayVec::deref, which in turn does some trivial access of the two ArrayVec fields and calls slice::from_raw_parts on the result. This should indeed be about the cheapest possible thing you can do with an ArrayVec.

My guess is that actually accessing the ArrayVec's fields causes a memory access which is always going to be a cache miss, because each of your ArrayVec is larger than a cache line. So you end up generating a lot more memory traffic than you would if you simply traverse the tree but don't access its contents.

jessegrosjean · January 7, 2021, 2:15am

Thanks yet again.

It's good to know that as_slice is really as simple and fast as I thought it was. For this test I wasn't actually keeping the slice around, so I know the problem isn't that I'm accessing those elements... but that suggesting is leading me to the problem. I'll report back tomorrow once I'm more awake and have figured issue out a bit more.

Thanks,
Jesse

system · April 7, 2021, 2:15am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Best practices for large stack allocated data that will end up on heap? help	3	586	May 12, 2020
Performance issues with using Slice indexing buffer[4..20]? help	2	654	January 12, 2023
A curmudgeon talks about performance meta	13	1319	October 28, 2019
Does rust need to start from the first always when getting an element by index? help	6	280	November 15, 2023
Ndarray, stack and heap memory, and overhead help	6	1885	January 12, 2023

Performance question related to slices

Related Topics