I'm building a Rust wrapper around an existing C library. The C library exposes an iterator-like API
// true iff there's an item available
bool iter_valid(iterator *iter);
// move next
void iter_advance(iterator *iter);
// Sets *item to the current value
// - Callers must not modify that memory
// - The pointer is only valid until the next call to iter_advance()
// - Behavior only defined if iter_valid() == true
void iter_get_current(
iterator * iter, // IN
const char ** item, // OUT
size_t * item_len // OUT
);
What would be an idiomatic representation in Rust?
(Note that the C library does some fancy user-space paging under-the-hood, so the OUT pointer really must not be used after the next call to iter_advance().)
I would like to expose a std::iter::Iterator so that users of my Rust library could take advantage of all the goodies provided by that trait. But to make things safe, the Item (which would contain the C pointer) must not last beyond the next next() call. I can't see how to bound the lifetime of Item that way. So maybe Iterator is the wrong abstraction?
Instead, maybe I should expose a next() function that takes ownership of the previous result?
trait WeirdIterator {
type Item;
// panics if called a second time
fn first(&mut self) -> Option<Item>;
fn next(&mut self, prev: Item) -> Option<Item>;
}
This is problematic if it's possible to obtain two WeirdIterators of the same type at the same time, because then a user could pass the item from one to the other.
Another option would be to have your wrapper somehow obtain ownership of a copy of the item. Then you could use the standard Iterator trait. Whether that's reasonable depends on what exactly these char pointer typed items look like. If it's only possible to copy the items by allocating new Vecs for each item, this might be undesirable overhead, depending on your use case.
Streaming iterators are the right abstraction here, but you'll run into issues with generic abstract types not being stable or having a lot of the nice adapters that normal iterators have.
Assuming you don't want to use nightly features, you'll probably be stuck with a next() method which returns an item who's lifetime is attached to the &mut self to make sure you can't call next() again without dropping the previous value.
struct MyIterator {
iter: *mut Iter,
}
impl MyIterator {
fn next(&mut self) -> Option<&[u8]> {
...
}
}
fn main() {
let mut iter = new_my_iterator();
while let Some(item) = iter.next() {
...
}
}
Yes, I think this is the wrong abstraction. (Personally, I think it's awesome that lifetime gives us the right language to correct distinguish the concepts of iterator and at least in Rust I prefer calling this a cursor instead). However, consider that there is a simple way to adapt this to an iterator—if the user provides a function that takes ownership of the relevant portions of data.
struct Item<'lt>(…);
impl Cursor {
// Combines valid and get_current: 'parse, don't validate'.
// The approach also works if this takes `&mut self`.
fn item(&self) -> Option<Item<'_>> { … }
fn next(&mut self) { .. }
// The magic sauce.
fn into_iterator<T>(mut self, mut owner: impl FnMut(Item) -> T)
-> impl Iterator<Item=T>
{
core::iter::from_fn(move || {
let result = self.item().map(&mut owner);
self.next();
result
})
}
}
Now consumers can use the iterator combinators if they bring some form of taking ownership. (For example, cloning relevant portions to their own allocation, extracting data and then dropping the borrow, etc.). Or they can use an API that offers the power of the C interface and let's them re-acquire the data before advancing.
Note that this kind of API already exists if you’re using StreamingIterator. Both generally/generically with the map_deref method taking a closure like your into_iterator method, and less general but easier to use with .owned() which will e.g. turn a StreamingIterator of &[u8]s into an (ordinary) Iterator of Vec<u8>.
Of course, StreamingIterator only supports &T items, while you’re addressing – more generally – any type Item<'lt>.
A follow-up question from a newbie. This does what I want: it prevents me from holding a reference to the item beyond the loop body. But the compiler message doesn't suggest that's the problem. I really don't understand why this is preventing me from retaining the reference:
struct MyIterator {}
impl MyIterator {
fn next(&mut self) -> Option<&[u8]> {
None
}
}
fn main() {
let mut accumulator: Vec<&[u8]> = Vec::new();
let mut iter = MyIterator {};
while let Some(item) = iter.next() {
println!("{:?}", item); // this line is ok
accumulator.push(item); // this line causes an error
}
}
error[E0499]: cannot borrow `iter` as mutable more than once at a time
--> src/main.rs:16:28
|
16 | while let Some(item) = iter.next() {
| ^^^^ `iter` was mutably borrowed here in the previous iteration of the loop
Why does the the accumulator.push(item) call cause a compiler error 2 lines above?
Follow-up: I can see the same compiler error when I unroll the loop by hand.
fn builds() {
let mut iter = MyIterator {};
let _item1 = iter.next().unwrap();
let _item2 = iter.next().unwrap();
println!("this works: {:?}", _item2);
}
fn does_not_build() {
let mut iter = MyIterator {};
let _item1 = iter.next().unwrap();
let _item2 = iter.next().unwrap();
println!("this doesn't compile: {:?}", _item1);
}
gives
error[E0499]: cannot borrow `iter` as mutable more than once at a time
--> src/lib.rs:35:18
|
34 | let _item1 = iter.next().unwrap();
| ---- first mutable borrow occurs here
35 | let _item2 = iter.next().unwrap();
| ^^^^ second mutable borrow occurs here
36 | println!("this doesn't compile: {:?}", _item1);
| ------ first borrow later used here
To be clear: I am happy to see the compiler prevent me from doing the bad thing. I just don't understand how that compiler error message relates to the bad thing!
A fundamental principle of Rust’s ownership and borrowing model is that “mutable references”, i.e. &mut T, also sometimes called “unique references”, are exclusive. There’s never two distinct mutable references to the same thing at the same time.
is telling the compiler that the returned &[u8] reference has the same lifetime as the mutable &mut MyIterator reference. This holding on to more than one of the &[u8] item at the time is impossible because:
the items have the same lifetime as the mutable references of MyIterator that were used to create them, hence
two items existing / “being alive” at the same time means two mutable references to the same thing, the iter: MyIterator` being alive at the same time
The error message is talking about those mutable borrows of iter rather than the items. The compiler has figured out how long the borrow of iter would need to stay alive in order to use the item(s) the way you use them and then concludes that there’s two mutable borrows of iter overlapping, hence the error.
By the way, leaving an item unused in a local variable is not problematic. A reference can still technically “exist” without being considered “alive” anymore, but only if it’s no longer accessed at all. So “retaining the reference” is not strictly prohibited, just doing anything with it after the next call to next is prohibited (but that’s enough to fulfill the safety conditions of the C function).
That’s why the error message points out the place where _item1 is accessed/used.
Thank you. I'd read, and re-read Chapter 10 of the Book. But this hadn't quite "clicked" until I read your comment. The sort of "backpressure" onto the lifetime of the function argument and the interaction of that with the exclusivity of the mutable references makes sense now!
My first reaction was "This is fragile! Why should the safety of the item access rely on exclusivity of the iterator's mutable-reference?"
But then I thought about it more, and realized that the item lifetime requirement exists precisely because of mutability within the iterator. So it totally is "for the right reason"!
(Still, I wish the compiler errors were as thorough as your explanation).
would also be able to enforce that the item is no longer used after advancing, because shared references and mutable references can’t exist at the same time, so when calling advance (which takes &mut self), the item obtained through get_current (which takes &self) must no longer be alive.