How do I return rust iterator from a python module function using pyo3

Its not like I am not able to return any rust iterators from a python module function using pyo3. The problem is when lifetime doesn't live long enough!

Allow me to explain.

First attempt:

#[pyclass]
struct ItemIterator {
    iter: Box<dyn Iterator<Item = u64> + Send>,
}

#[pymethods]
impl ItemIterator {
    fn __iter__(slf: PyRef<'_, Self>) -> PyRef<'_, Self> {
        slf
    }
    fn __next__(mut slf: PyRefMut<'_, Self>) -> Option<u64> {
        slf.iter.next()
    }
}

#[pyfunction]
fn get_numbers() -> ItemIterator {
    let i = vec![1u64, 2, 3, 4, 5].into_iter();
    ItemIterator { iter: Box::new(i) }
}

In the contrived example above I have written a python iterator wrapper for our rust iterator as per pyo3 guide and it works seemlessly.

Second attempt:
The problem is when lifetimes are involved.

Say now I have a Warehouse struct that I would want make available as python class alongside pertaining associated functions.

struct Warehouse {
    items: Vec<u64>,
}

impl Warehouse {
    fn new() -> Warehouse {
        Warehouse {
            items: vec![1u64, 2, 3, 4, 5],
        }
    }

    fn get_items(&self) -> Box<dyn Iterator<Item = u64> + '_> {
        Box::new(self.items.iter().map(|f| *f))
    }
}

Implementing them as python class and methods

#[pyclass]
struct ItemIterator {
    iter: Box<dyn Iterator<Item = u64> + Send>,
}

#[pymethods]
impl ItemIterator {
    fn __iter__(slf: PyRef<'_, Self>) -> PyRef<'_, Self> {
        slf
    }
    fn __next__(mut slf: PyRefMut<'_, Self>) -> Option<u64> {
        slf.iter.next()
    }
}

#[pyclass]
struct Warehouse {
    items: Vec<u64>,
}

#[pymethods]
impl Warehouse {
    #[new]
    fn new() -> Warehouse {
        Warehouse {
            items: vec![1u64, 2, 3, 4, 5],
        }
    }

    fn get_items(&self) -> ItemIterator {
        ItemIterator {
            iter: Box::new(self.items.iter().map(|f| *f)),
        }
    }
}

This throws compiler error in getItems function saying:

error: lifetime may not live long enough
  --> src/lib.rs:54:19
   |
52 |     fn get_items(&self) -> ItemIterator {
   |                  - let's call the lifetime of this reference `'1`
53 |         ItemIterator {
54 |             iter: Box::new(self.items.iter().map(|f| *f)),
   |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cast requires that `'1` must outlive `'static`

error: could not compile `pyo3-example` due to previous error

I am not really sure how to fix this. Can someone explain what's really going on here. How does this compare to my first attempt implementing iterators and how to fix this?

Well, the error isn't related to pyo3, but pyo3 may add constraints on a possible solution. One possible solution would be to use explicit lifetimes (though I don't know how well this will work with pyo3. Maybe pyo3 needs owned data? I don't know). This would mean though, that your iterator wrapper type ItemIterator can't outlive Warehouse. Here an example without pyo3:

struct ItemIterator<'a> {
    iter: Box<dyn Iterator<Item = u64> + Send + 'a>,
}

struct Warehouse {
    items: Vec<u64>,
}

impl Warehouse {
    fn new() -> Warehouse {
        Warehouse {
            items: vec![1u64, 2, 3, 4, 5],
        }
    }

    fn get_items<'a>(&'a self) -> ItemIterator<'a> {
        ItemIterator {
            iter: Box::new(self.items.iter().map(|f| *f)),
        }
    }
}

Playground.

If for some reason such a design is impossible to use with pyo3 due to the fact lifetimes can't be handled in a pyclass struct, you could consume Warehouse (or a clone of it) when you create ItemIterator. This would look more like your first solution:

struct ItemIterator {
    iter: Box<dyn Iterator<Item = u64> + Send>,
}

struct Warehouse {
    items: Vec<u64>,
}

impl Warehouse {
    fn new() -> Warehouse {
        Warehouse {
            items: vec![1u64, 2, 3, 4, 5],
        }
    }

    fn get_items(self) -> ItemIterator {
        ItemIterator {
            iter: Box::new(self.items.into_iter()),
        }
    }
}

Playground.

There's also a StackOverflow thread for this question here: How do I return rust iterator from a python module function using pyo3 - Stack Overflow

It's generally a good idea to include a link if you post in both places, to avoid people duplicating effort when they're answering.

2 Likes

You mean collecting all items into a vector and then return an iterator that owns all of the items? Won't that be very inefficient specially if it is huge huge vector?

According to pyo3's user guide you can use smart, sendable pointers to circumvent the restrictions on lifetimes. You could use this to build your own iterator instead of using Box<dyn Iterator ...> like this:

use pyo3::{pyclass, pymethods, PyRef, PyRefMut};

use std::sync::Arc;

#[pyclass]
struct ItemIterator {
    iter: Arc<Vec<u64>>,
    index: usize,
}

#[pymethods]
impl ItemIterator {
    fn __iter__(slf: PyRef<'_, Self>) -> PyRef<'_, Self> {
        slf
    }
    fn __next__(mut slf: PyRefMut<'_, Self>) -> Option<u64> {
        let res = slf.iter.get(slf.index).copied();
        slf.index += 1;
        res
    }
}

#[pyclass]
struct Warehouse {
    items: Arc<Vec<u64>>,
}

#[pymethods]
impl Warehouse {
    #[new]
    fn new() -> Warehouse {
        Warehouse {
            items: Arc::new(vec![1u64, 2, 3, 4, 5]),
        }
    }

    fn get_items(&self) -> ItemIterator {
        ItemIterator {
            iter: self.items.clone(),
            index: 0,
        }
    }
}

Copying an Arc pointer is cheap and that way ItemIterator shares access to the data in Warehouse.

1 Like

@jofas Thanks for this detailed answer! It makes all the sense.

Unlike in the a rather contrived example I created to understand this, the problem I am facing is that I am using one of our internal libraries that maps, filters, etc. over this vector (potentially huge) and eventually returns Box<dyn Iterator ...> which is available in the python module (which I am trying to write) after a bunch of API calls. Does this mean to achieve what you have just demonstrated I would still need to collect all the elements in a vector?

You can see this implementation here: docbrown/graphdb.rs at features/loaders · Raphtory/docbrown · GitHub

Grateful for all help!

I think it does. Or better put, I don't know how to construct an iterator that has ownership over the iterated data from an iterator that doesn't, without allocating it into a vector (or other collection type) first. If you look at the method you pointed out:

this is exactly what happens. You collect the vertices into a vector and provide VertexIterator with an owned iterator over the vector that was created by the collect method.

I don't know a better way to be honest. Maybe someone more experienced with how to write highly optimized Rust code can help you.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.