How to persist scratch buffers using Rayon

I’m running a fairly complicated bit of code over a few million rows of CSV data (parsing senate voting preferences for Australian senate elections.)

For each row I need to convert the preference the voter has expressed into two different potential forms, and then following some precedence rules expand one of those forms into a final resolution of how the voter cast their ballot. This ends up being represented by three Vec<u8>, each of which has length in the order of 200 elements.

Because I’m running this code over millions of rows (one row per vote), I have worked to avoid constantly reallocating those three vectors. This shaves ~ 20% off my runtime.

I can’t figure out how to approach this issue in Rayon. I’m trying to use .par_iter(), but I can’t see how I can set up persist the three buffers as some resource that’s available on per-worker thread and can somehow be accessed from the closure?

If anyone can suggest an approach, I’d be really grateful for the help - thanks!

2 Likes

You could use thread local values if you wrap the buffers in RefCell.

If your input is in a slice or vector, you could use par_chunks and zip that with a mutable slice of scratch buffers.

For a little less “persistence”, you could use map_init to dynamically create local buffers. Your par_iter will split into a number of smaller jobs, then map_init will initialize once for each of these groups.

2 Likes

Thanks so much, map_init looks perfect! :slight_smile: