Wrapping a C API with a Rust Iterator?

rosenhouse · September 21, 2021, 5:30pm

Hi there, very new Rust user here!

I'm building a Rust wrapper around an existing C library. The C library exposes an iterator-like API

// true iff there's an item available
bool iter_valid(iterator *iter);

// move next
void iter_advance(iterator *iter);

// Sets *item to the current value
// - Callers must not modify that memory
// - The pointer is only valid until the next call to iter_advance()
// - Behavior only defined if iter_valid() == true
void iter_get_current(
    iterator   *    iter,     // IN
    const char **   item,     // OUT
    size_t *        item_len  // OUT
);

What would be an idiomatic representation in Rust?

(Note that the C library does some fancy user-space paging under-the-hood, so the OUT pointer really must not be used after the next call to iter_advance().)

I would like to expose a std::iter::Iterator so that users of my Rust library could take advantage of all the goodies provided by that trait. But to make things safe, the Item (which would contain the C pointer) must not last beyond the next next() call. I can't see how to bound the lifetime of Item that way. So maybe Iterator is the wrong abstraction?

Instead, maybe I should expose a next() function that takes ownership of the previous result?

trait WeirdIterator {
    type Item;

    // panics if called a second time
    fn first(&mut self) -> Option<Item>;

    fn next(&mut self, prev: Item) -> Option<Item>;
}

How could I make this more "ergonomic"?

Thank you!

steffahn · September 21, 2021, 5:36pm

Maybe streaming_iterator::StreamingIterator can work for you.

This is problematic if it's possible to obtain two WeirdIterators of the same type at the same time, because then a user could pass the item from one to the other.

Another option would be to have your wrapper somehow obtain ownership of a copy of the item. Then you could use the standard Iterator trait. Whether that's reasonable depends on what exactly these char pointer typed items look like. If it's only possible to copy the items by allocating new Vecs for each item, this might be undesirable overhead, depending on your use case.

Michael-F-Bryan · September 21, 2021, 9:46pm

Streaming iterators are the right abstraction here, but you'll run into issues with generic abstract types not being stable or having a lot of the nice adapters that normal iterators have.

Assuming you don't want to use nightly features, you'll probably be stuck with a next() method which returns an item who's lifetime is attached to the &mut self to make sure you can't call next() again without dropping the previous value.

struct MyIterator {
    iter: *mut Iter,
}

impl MyIterator {
    fn next(&mut self) -> Option<&[u8]> {
       ...
    }
}

fn main() {
    let mut iter = new_my_iterator();

    while let Some(item) = iter.next() {
        ...
    }
}

(playground)

HeroicKatora · September 21, 2021, 10:19pm

Yes, I think this is the wrong abstraction. (Personally, I think it's awesome that lifetime gives us the right language to correct distinguish the concepts of iterator and at least in Rust I prefer calling this a cursor instead). However, consider that there is a simple way to adapt this to an iterator—if the user provides a function that takes ownership of the relevant portions of data.

struct Item<'lt>(…);

impl Cursor {
    // Combines valid and get_current: 'parse, don't validate'.
    // The approach also works if this takes `&mut self`.
    fn item(&self) -> Option<Item<'_>> { … }
    fn next(&mut self) { .. }

    // The magic sauce.
    fn into_iterator<T>(mut self, mut owner: impl FnMut(Item) -> T)
        -> impl Iterator<Item=T>
    {
        core::iter::from_fn(move || {
            let result = self.item().map(&mut owner);
            self.next();
            result
        })
    }
}

Now consumers can use the iterator combinators if they bring some form of taking ownership. (For example, cloning relevant portions to their own allocation, extracting data and then dropping the borrow, etc.). Or they can use an API that offers the power of the C interface and let's them re-acquire the data before advancing.

steffahn · September 21, 2021, 10:49pm

Note that this kind of API already exists if you’re using StreamingIterator. Both generally/generically with the map_deref method taking a closure like your into_iterator method, and less general but easier to use with .owned() which will e.g. turn a StreamingIterator of &[u8]s into an (ordinary) Iterator of Vec<u8>.

Of course, StreamingIterator only supports &T items, while you’re addressing – more generally – any type Item<'lt>.

rosenhouse · September 21, 2021, 11:58pm

Thanks!

A follow-up question from a newbie. This does what I want: it prevents me from holding a reference to the item beyond the loop body. But the compiler message doesn't suggest that's the problem. I really don't understand why this is preventing me from retaining the reference:

struct MyIterator {}

impl MyIterator {
    fn next(&mut self) -> Option<&[u8]> {
        None
    }
}

fn main() {
    let mut accumulator: Vec<&[u8]> = Vec::new();
    let mut iter = MyIterator {};
    while let Some(item) = iter.next() {
        println!("{:?}", item); // this line is ok
        accumulator.push(item); // this line causes an error
    }
}

(playground)

The compiler error:

error[E0499]: cannot borrow `iter` as mutable more than once at a time
  --> src/main.rs:16:28
   |
16 |     while let Some(item) = iter.next() {
   |                            ^^^^ `iter` was mutably borrowed here in the previous iteration of the loop

Why does the the accumulator.push(item) call cause a compiler error 2 lines above?

rosenhouse · September 22, 2021, 12:17am

Follow-up: I can see the same compiler error when I unroll the loop by hand.

fn builds() {
    let mut iter = MyIterator {};
    let _item1 = iter.next().unwrap();
    let _item2 = iter.next().unwrap();
    println!("this works: {:?}", _item2);
}

fn does_not_build() {
    let mut iter = MyIterator {};
    let _item1 = iter.next().unwrap();
    let _item2 = iter.next().unwrap();
    println!("this doesn't compile: {:?}", _item1);
}

gives

error[E0499]: cannot borrow `iter` as mutable more than once at a time
  --> src/lib.rs:35:18
   |
34 |     let _item1 = iter.next().unwrap();
   |                  ---- first mutable borrow occurs here
35 |     let _item2 = iter.next().unwrap();
   |                  ^^^^ second mutable borrow occurs here
36 |     println!("this doesn't compile: {:?}", _item1);
   |                                            ------ first borrow later used here

playground link

To be clear: I am happy to see the compiler prevent me from doing the bad thing. I just don't understand how that compiler error message relates to the bad thing!

steffahn · September 22, 2021, 12:32am

A fundamental principle of Rust’s ownership and borrowing model is that “mutable references”, i.e. &mut T, also sometimes called “unique references”, are exclusive. There’s never two distinct mutable references to the same thing at the same time.

The method signature

fn next(&mut self) -> Option<&[u8]>

or more explicitly

fn next<'a>(self: &'a mut MyIterator) -> Option<&'a [u8]>

is telling the compiler that the returned &[u8] reference has the same lifetime as the mutable &mut MyIterator reference. This holding on to more than one of the &[u8] item at the time is impossible because:

the items have the same lifetime as the mutable references of MyIterator that were used to create them, hence
two items existing / “being alive” at the same time means two mutable references to the same thing, the iter: MyIterator` being alive at the same time

The error message is talking about those mutable borrows of iter rather than the items. The compiler has figured out how long the borrow of iter would need to stay alive in order to use the item(s) the way you use them and then concludes that there’s two mutable borrows of iter overlapping, hence the error.

By the way, leaving an item unused in a local variable is not problematic. A reference can still technically “exist” without being considered “alive” anymore, but only if it’s no longer accessed at all. So “retaining the reference” is not strictly prohibited, just doing anything with it after the next call to next is prohibited (but that’s enough to fulfill the safety conditions of the C function).

That’s why the error message points out the place where _item1 is accessed/used.

rosenhouse · September 22, 2021, 12:56am

Thank you. I'd read, and re-read Chapter 10 of the Book. But this hadn't quite "clicked" until I read your comment. The sort of "backpressure" onto the lifetime of the function argument and the interaction of that with the exclusivity of the mutable references makes sense now!

My first reaction was "This is fragile! Why should the safety of the item access rely on exclusivity of the iterator's mutable-reference?"

But then I thought about it more, and realized that the item lifetime requirement exists precisely because of mutability within the iterator. So it totally is "for the right reason"!

(Still, I wish the compiler errors were as thorough as your explanation).

Thanks again.

steffahn · September 22, 2021, 1:12am

For completeness, note that an API

impl MyIterator {
    fn advance(&mut self) { … }
    fn get_current(&self) -> Option<&[u8]> { … }
}

would also be able to enforce that the item is no longer used after advancing, because shared references and mutable references can’t exist at the same time, so when calling advance (which takes &mut self), the item obtained through get_current (which takes &self) must no longer be alive.

system · December 21, 2021, 1:12am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Iterator, mutable reference and FFI help	8	954	June 14, 2020
Lifetime for Iterator item help	9	2270	July 21, 2019
How to couple lifetime expiration to function calls help	5	582	March 15, 2022
[Solved] Rust API Design: Wrapping "Iterators" Which Mutate State help	7	2232	January 12, 2023
Constrain Iterator::next return value validity until next call to Iterator::next help	4	814	January 12, 2023

Wrapping a C API with a Rust Iterator?

Related topics