Process a list of objects owned by Python using PyO3 and Rayon

Hi everyone,

I'm writing a small Python extension in Rust using PyO3. The extension simulates a collection of independent, finite Markov processes as state machines that are stepped from state-to-state by drawing random numbers to determine the next state and when it occurs. I hit various bottlenecks implementing this in both single-threaded pure Python and using Python's multiprocessing, so I decided next to try to step each state machine in parallel in Rust. I need to simulate about 10 million state machines, with each one going through between 0 and ~100 transitions per simulation step.

Here's a simplified version of something close to what I would like to do (scroll down to see the whole code):

#[pyclass]
struct StateMachine {
    state: i32
}

impl StateMachine {
    #[new]
    fn new() -> Self {
        StateMachine { state: 0 }
    }

    fn run(&mut self) -> Transition {
        // Normally random number generation occurs here, but do this
        // instead for the sake of simplicity
        self.state += 1;
        Transition { data: 1.0 }
    }
}

#[pyclass]
struct Transition {
    data: f64
}

#[pyfunction]
fn par_run(machines: Vec<&PyCell<StateMachine>>) -> Vec<Transition> {
    Python::with_gil(|_| {
        machines
            .into_par_iter()
            .map(|machine| machine.borrow_mut().run())
            .collect()
    })
}

StateMachine instances should be created in Python and are owned by the Python interpreter. par_run is a Python function that takes a list of StateMachines as an argument. Then, the run() method for each object is farmed out to a Rayon parallel iterator, where each StateMachine's Rust run method is called.

The problem is that &PyCell<StateMachine> is not Sized, which is required by Rayon's IntoParallelIterator trait by way of its ParallelIterator trait. And if I understand correctly, I must use PyCell to get references to the state machines coming from Python because Rust doesn't own them.

The compiler error is:

error[E0599]: the method `into_par_iter` exists for struct `Vec<&pyo3::PyCell<StateMachine>>`, but its trait bounds were not satisfied
  --> src/main.rs:25:14
   |
25 |             .into_par_iter()
   |              ^^^^^^^^^^^^^ method cannot be called on `Vec<&pyo3::PyCell<StateMachine>>` due to unsatisfied trait bounds
   |
   = note: the following trait bounds were not satisfied:
           `[&pyo3::PyCell<StateMachine>]: Sized`
           which is required by `[&pyo3::PyCell<StateMachine>]: rayon::iter::IntoParallelIterator`
           `[&pyo3::PyCell<StateMachine>]: rayon::iter::ParallelIterator`
           which is required by `[&pyo3::PyCell<StateMachine>]: rayon::iter::IntoParallelIterator`

The crux of my question is: how can I run a method on a collection of independent Python objects in parallel?

Thanks!

It is; pointers always are. You misread the error message – it's complaining about a slice not being sized.

I can't reproduce this in the Playground as it doesn't have PyO3 available, but given that you plan to give away ownership of the vector, you could try if it works with a move |_| { ... } closure instead.

2 Likes

Thanks a lot for your help @H2CO3

After a lot of trial-and-error and help from the PyO3 devs, I managed to get the following to work:

#[pyfunction]
pub fn par_run(machines: Vec<&PyCell<StateMachine>>) -> PyResult<Vec<Transition>> {
    let mut machines = machines
        .into_iter()
        .map(|cell| cell.try_borrow_mut())
        .collect::<Result<Vec<PyRefMut<
>>, _>>()?;

    let mut machines = machines
        .iter_mut()
        .map(|refr| refr.deref_mut())
        .collect::<Vec<&mut StateMachine>>();

    Ok(machines
        .par_iter_mut()
        .map(|machine| machine.run())
        .collect()
    )  
}

The pyfunction macro will claim the GIL for you. After that, I struggled a bit to get the mutable references out of the Python-owned cells, but in the end the above works without copying the data.

That seems like a lot of collecting. Wouldn't the following work instead?

#[pyfunction]
pub fn par_run(machines: Vec<&PyCell<StateMachine>>) -> PyResult<Vec<Transition>> {
    machines
        .into_par_iter()
        .map(|cell| Ok(cell.try_borrow_mut()?.run()))
        .collect()
}

I agree. I tried various permutations around iterating and collecting but almost always ran into issues on the trait bounds that were required by Rayon's parallel iterators. IIRC your suggestion gives a compiler error similar to the one in my original post. (I can verify this tomorrow.)

I'd be happy to hear more suggestions on simplifying what I have working.

I played around with this a bit and it looks like it can't be simplified, after all. The reason is that PyCell is not Send, which means that arrays of PyCell in any form won't implement any of the parallel iterator traits.