I am writing a perfect connect 4 AI using a minimax game tree search with alpha-beta pruning and a transposition table. I would like to make it faster by parallelising the search, but I am not finding much success.
Originally a single-threaded search of one specific position took ~10s. Changing the transposition table to a lock-free atomic-based
Arc<Vec> of entries rather than a
Rc<RefCell<Vec>>> increased this to ~11s. So far my attempts to parallelise this have made it 6-7x slower (60-70s) and I'm not sure why.
My parallel version takes a given position and spawns a thread for each possible move to make a search one level down (max. 6 threads), and the best score found out of those is returned. Given that the transposition table is lock-free I wouldn't suspect contention.
I have also tried using
async-std to try and put each search operation into a thread pool, but wasn't able to make it work due to the recursive nature of the minimax tree search.
Does anyone have any ideas why the parallel version would be so much slower?