Bumpalo and numa

Hi,

I have a problem that is very parallel and running on a two socket system exposing 4 numa nodes to the os (windows). Now, it seems to be latency bound so I’m looking to make it numa aware.

The memory used is roughly in two parts, a working set for each thread and a large global result Vec.

The plan is roughly to on process start lock each of the rayon threads to a node. Thread working memory will then be allocated in each thread which should make windows allocate it on the right numa node.

The problem I have is that each worker thread needs to allocate lots of relatively small objects and occasionally reset them.

So I found bumpalo which looks perfect for this use case.

Now to my question - I would like to use huge pages (I can find any crate for that on windows so will be using VirtualAlloc2 directly) - but I can’t seem to find any way in bumpalo to myself provide the backing array - I.e. is it possible to not get bumpalo to allocate the backing array from the standard allocator (which I assume will use normal pages)? Or is there some other smart solution I’m not think about apart from forking bumpalo and modifying the allocation code myself?

Any pointers appreciated.

1 Like

Fyi, posted Pull request? - Allocating underlying store from non-global allocator · Issue #259 · fitzgen/bumpalo since it seems clear to me that it's not possible today.

Still a bit unsure if this is the best approach - but it makes sense to me.

1 Like