Type conversion costs of PyO3 - really this costly?

I have written a fairly simple recursive binary search function to process a list of floats in Python:

def index_left(list_input: list[Any], list_length: int, value: Any, left_count: Optional[int] = 0):
    if list_length == 1:
        raise ValueError("`index_left` designed for intervals. Cannot index list of length 1.")

    if list_length == 2:
        return left_count

    split = floor((list_length - 1) / 2)
    if list_length == 3 and value == list_input[split]:
        return left_count

    if value <= list_input[split]:
        return index_left(list_input[: split + 1], split + 1, value, left_count)
    else:
        return index_left(list_input[split:], list_length - split, value, left_count + split)

Some basic profiling suggests the following:

  • 100 x list of 1e6 floats: 900ms
  • 1e6 x list of 100 floats: 1999ms

I have written a similar function rust:

pub fn index_left<T>(list_input: &[T], value: &T, left_count: Option<usize>) -> usize
where for<'a> &'a T: PartialOrd + PartialEq
{
    let lc = left_count.unwrap_or(0_usize);
    let n = list_input.len();
    match n {
        1 => panic!("`index_left` designed for intervals. Cannot index sequence of length 1."),
        2 => lc,
        _ => {
            let split = (n - 1_usize) / 2_usize;  // this will take the floor of result
            if n == 3 && value == &list_input[split] {
                lc
            } else if value <= &list_input[split] {
                index_left(&list_input[..=split], value, Some(lc))
            } else {
                index_left(&list_input[split..], value, Some(lc + split))
            }
        }
    }
}

And basic profiling suggests the following:

  • 100 x vector of 1e6 f64s: 62us
  • 1e6 x vector of 100 f64s: 221ms

Now I have used PyO3 to port the rust function to Python as follows (using maturin develop --release):

#[pyfunction]
pub fn index_left_from_rust (list_input: Vec<f64>, value: f64, left_count: Option<usize>) -> usize {
    index_left(&list_input[..], &value, left_count)
}

Profiling this from Python I get:

  • 1e6 x list of 100 floats: 1537ms
  • 100 x list of 1e6 floats: 1307ms

The PyO3 website warns of type conversion costs. Is this what I have encountered here or am I doing something fundamentally incorrect?

Are you compiling the Rust code with optimizations enabled?

1 Like

Yes, this is the conversion cost your seeing here

Note that you're not just paying the cost of converting the items you're accessing, but all the items in the lists! This means your O(logn) algorithm just became O(n) due to having to perform the conversion.

2 Likes

Understood. To pass on the full rust improvement speeds would there be any options? Not using pyo3, using pyo3 differently, etc.?

You could try taking a PyList/PySequence and implement the binary search over that.

Another option is to wrap Vec in a pyo3 struct and use that instead of a list in python. Depending on what tradeoffs you are willing to make.

That's why numpy have its own array type instead of rely on python native list type.

Thanks all, this was very helpful, also for putting a number of concepts into perspective for architecture design.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.