Why my Conway's game of life becomes slower after using multi-threads

I am a completely beginner in Rust and I am currently writing this parallel Conway's game of life. The code itself works fine but the problem is that after using multi-threads the program becomes slower(I measure the speed of the program by counting the time the glider moves from the top-left corner to the bottom-right corner). I did some experiments and it became slower and slower as the number of threads increases. I also have a Java version using almost the same algorithm, it works just fine. All I expect is that the Rust version can become at least slightly faster with threads more than one. Can anyone please point out where I did wrong? I am sorry if the code seems unreasonable, as I said I am a completely beginner :-).

main.rs read the command line arguments and does the board update.

extern crate clap;
extern crate termion;
extern crate chrono;

use std::thread;
use std::sync::Arc;
use chrono::prelude::*;


mod board;
mod config;

use board::Board;
use config::Config;

fn main() {
    let dt1 = Local::now();

    let matches = clap::App::new("conway")
        .arg(clap::Arg::with_name("length")
            .long("length")
            .value_name("LENGTH")
            .help("Set length of the board")
            .takes_value(true))
        .arg(clap::Arg::with_name("threads")
            .long("threads")
            .value_name("THREADS")
            .help("How many threads to update the board")
            .takes_value(true))
        .arg(clap::Arg::with_name("display")
            .long("display")
            .value_name("DISPLAY")
            .help("Display the board or not")
            .takes_value(true))
        .arg(clap::Arg::with_name("delay")
            .long("delay")
            .value_name("MILLISECONDS")
            .help("Delay between the frames in milliseconds")
            .takes_value(true))
        .get_matches();

    let config = Config::from_matches(matches);
    let mut board = Board::new(config.length);
    let mut start: bool = false;
    let mut end: bool = false;
    let mut start_time: DateTime<Local> = Local::now();
    let mut end_time: DateTime<Local>;

    board.initialize_glider();
    loop {
        if config.display == 1 {
            print!("{}{}", termion::clear::All, termion::cursor::Goto(3, 3));
            board_render(&board);
        }

        if board.board[0][1] == 1 && !start {
            start_time = Local::now();
            start = true;
        }
        if board.board[config.length - 1][config.length - 1] == 1 && !end {
            end_time = Local::now();
            println!("{}", end_time - start_time);
            end = true;
        }

        board = board::Board::update(Arc::new(board), config.threads);
        thread::sleep(config.delay);
    }
}

fn board_render(board: &Board) {
    let mut output = String::with_capacity(board.n * (board.n + 1));
    for i in 0..board.n {
        for j in 0..board.n {
            let ch;
            if board.board[i][j] == 0 {
                ch = '░';
            } else {
                ch = '█';
            }
            output.push(ch);
        }
        output.push_str("\n  ");
    }
    print!("{}", output);
}

board.rs is where the algorithm of updating the board with multiple threads exits

use std::sync::{Mutex, Arc};
use std::thread;

pub struct Board {
    pub n: usize,
    pub board: Vec<Vec<i32>>,
}

impl Board {
    pub fn new(n: usize) -> Board {
        let board = vec![vec![0; n]; n];
        Board {
            n,
            board,
        }
    }

    pub fn update(Board: Arc<Self>, t_num: usize) -> Board {
        let new_board = Arc::new(Mutex::new(Board::new(Board.n)));
        let mut workers = Vec::with_capacity(t_num);

        let block_size = Board.n / t_num;
        let mut start = 0;

        for t in 0..t_num {
            let old_board = Board.clone();

            let new_board = Arc::clone(&new_board);
            let mut end = start + block_size;

            if t == t_num - 1 { end = old_board.n; }

            let worker = thread::spawn(move || {
                let mut board = new_board.lock().unwrap();
                for i in start..end {
                    for j in 0..old_board.n {
                        let im = (i + old_board.n - 1) % old_board.n;
                        let ip = (i + 1) % old_board.n;
                        let jm = (j + old_board.n - 1) % old_board.n;
                        let jp = (j + 1) % old_board.n;
                        let sum = old_board.board[im][jm] + old_board.board[im][j]
                            + old_board.board[im][jp] + old_board.board[i][jm] + old_board.board[i][jp]
                            + old_board.board[ip][jm] + old_board.board[ip][j] + old_board.board[ip][jp];

                        if sum == 2 {
                            board.board[i][j] = old_board.board[i][j];
                        } else if sum == 3 {
                            board.board[i][j] = 1;
                        } else {
                            board.board[i][j] = 0;
                        }
                    }
                }
            });

            workers.push(worker);
            start = start + block_size;
        }

        for worker in workers {
            worker.join().unwrap();
        }


        let result = new_board.lock().unwrap();
        let mut board = Board::new(Board.n);
        board.board = result.board.to_vec();

        board
    }

    pub fn initialize_glider(&mut self) -> &mut Board {
        self.board[0][1] = 1;
        self.board[1][2] = 1;
        self.board[2][0] = 1;
        self.board[2][1] = 1;
        self.board[2][2] = 1;

        self
    }
}

It is almost certainly that Mutex. It forces all of the threads to be serialized, so this isn't acutally parallel, but it has all the costs of creating new threads.

A better way to do this is with scoped threads. Scoped threads allow you to isolate your parallelism and not use unnecessary locks (like the Mutex you had). Unfortunately, std does not provide this yet. But there is an amazing crate for parallel work, rayon, that does have scoped threads.

playground (I may have made some mistakes in the game of life logic when translating, but this should be correct)

2 Likes

Fantastic answer, helps a lot!
The only problem is that when the slice length is not evenly divided by the chunk size, the last slice of the iteration will be the remainder. For example if the board length is 20 and the number of threads is 3, the length of the slices would be 6,6,6,2. So it would be out of boundary. We need to make the length of the slices 6,6,8. Any idea how to make that happen?

You'd probably have to cook up something yourself with some math + repeated applications of split_at_mut.

2 Likes

That for sure makes life hard.

Why not insist that a board is a power of two on each side. 16, 32, 64 128, 256 etc.

Do you need to torture yourself?

6, 6, 8 is hard, but 7, 7, 6 is easy!

let block_size = (self.n + t_num - 1) / t_num;

Brilliant idea, problem solved. Thanks a lot!

It depends. If you're willing to use Rust Nightly and aren't uncomfortable using unsafe, there's std::thread::Builder::spawn_unchecked. It gives you the ability to create threads with relaxed lifetime bounds, i.e. it's possible to pass references to it, which aren't 'static.

Yes, but it is wildly unsafe to use directly. In order to be any use, you must build out your own scoped threads api on top of it. But this is notoriously hard to do, for example crossbeam has seen multiple bugs in the past related to scoped threads. So it is better to just rely on these well tested crates.

Can you elaborate on what makes you say, that this is unsafer than other unsafe functions/methods?

It has an unbound lifetime parameter, which in combination with lifetime inference, makes it almost impossible to directly use it safely. It isn't the same as say, get_unchecked which has a clear way to check it's safety concerns, or raw pointers, where you can (relatively) easily control the lifetime of values created from the raw pointer. spawn_unchecked is on a similar level of unsafety as static mut. It makes it very easy to write incorrect, UB filled code.

2 Likes