Using PyO3 to run multi-threaded Rust code through Rayon

Hi everyone I am new to Rust and has been stuck with this bug for quite a while. Thanks in advance for any help.

The full code is linked here

#[pyclass]
pub struct GameManager {
    game: PostFlopGame,
    thread_pool: Option<Arc<rayon::ThreadPool>>,
}

#[pymethods]
impl GameManager {
    #[pyo3(text_signature = "(self, max_iterations, target_exploitability) -> float")]
    pub fn solve_game(&mut self, max_iterations: u32, target_exploitability: f32) -> SolverResult {
        println!("Starting solve with max_iterations: {}, target_exploitability: {}", max_iterations, target_exploitability);
            let exploitability = if let Some(pool) = &self.thread_pool {
                println!("Using custom thread pool");
                pool.install(|| solve(&mut self.game, max_iterations, target_exploitability, true))
            } else {
            println!("Using default threading");
                solve(&mut self.game, max_iterations, target_exploitability, true)
            };
            println!("Solve completed. Final exploitability: {}", exploitability);
...
}

^ in the above code, the solve function internally uses Rayon for speeding up the calculation. However, when actually running this code in Python, it runs super slow. Even though the correct number of threads are spawned and CPUs are occupied.

I tried to run the solve method in pure Rust, it runs fast as expected. I tried to wrap the running code inside the with_gil block, but it runs equally slow. I also tried to protect the game object with mutex, it didn't help either.

Do I miss anything here?
Thanks.

How are you measuring that the code "runs super slow" when you use it from python?

I ran the same code using pure rust, it takes 38s to run 100 iteration of the CFR algorithm. When running using Python, it's almost 100 times slower:
something like this:

    start_time = time.time()
    result = game_manager.solve_game(100, 0.3)
    end_time = time.time()

I don’t recall off the top of my head how Rust+Python projects are built and configured, but is it possible to forget configuring it to release mode (i.e. with optimizations turned on) in the process?


Edit: Looking this up, I’m learning, it might depend on the way you’re doing the integration of Rust code into Python. For example using maturin, it seems you’d need to explicitly pass a -r or --release argument to the maturin build or maturin develop commands.

I can't believe it!!!!! Thank you so much!!!!!

I was just about to reply that I just followed the PyO3 tutorial to run maturin develop.... It never mentioned the -r

It runs super fast now!

The whole time I was looking at the wrong direction (profiling threads) and ChatGPT wasn't helpful lol

how would building in release mode makes such a big difference?

I’m not sure, but if I had to guess… perhaps the solver relies heavily on inlining + loop unrolling + automatic usage of SIMD by the optimizer in its core loop, in such a case 100x improvement is not out of the question.

That's it!!! It did use all what you said heavily. Thank you again!

By the way, the GIL is irrelevant to how fast the Rust code runs; however, unlocking the GIL might allow more python code to run in parallel to the solver … I suppose this may or may not be relevant to keep GUIs up to date, or if you otherwise make use of any of python’s multi-threading features.

Anyways, if you wanted to try that, you can add a py: Python<'_> argument to your function (doesn’t change the python signature) and then do the solve inside of a py.allow_threads(|| …) call. See also this page.

I haven't using any of Python's multi-threading yet, but I will keep this in mind.
Thank you so much!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.