[Solved] Rust API Design: Wrapping "Iterators" Which Mutate State

My question is not "how do I do this," but "how do I design this in a Rust-like way."

I am writing a Rust wrapper around some C code. A simplified view of the C code looks like this:

struct ComplexStructure {
    // .. some fields here ..
    FILE *backing_store; // where most of the data resides
    Keystone *ksiter_item; // initialized to NULL
    int ksiter_index;
}

/* functions to access keystones */
Keystone *keystone_iter_current(ComplexStructure *structure) {
    return structure->ksiter_item; // NULL means "None", ptr means "Some"
}
int keystone_iter_start(ComplexStructure *structure) {
    structure->ksiter_index = -1;
    return keystone_iter_next(structure);
}
int keystone_iter_next(ComplexStructure *structure) {
    structure->ksiter_index += 1;  
    load_keystone_from_disk(structure); // mmap() the keystone, update item ptr to point to it
    if structure->ksiter_item == NULL {
         return 0; // iterator returned "None"
    } else {
         return 1; // iterator returned "Some"
    }
}
int keystone_get_index(ComplexStructure *structure, int index) {
    assert(index >= 0);
    structure->keystone_index = index - 1;
    return keystone_iter_next(structure);
}

Basically, the C library considers a keystone to be so large and heavy, they are kept on disk and mapped in, to ensure that that only the section of its data you actually access resides in memory. (While I would agree that very large keystones are possible -- larger than the RAM in most PCs -- I think this design is over-optimizing for the worst 1-2% of cases.)

Compared to idiomatic Rust iterators -- with all of their wonderful multi-threading properties you get for free -- this C "iterator API" is very clunky. And I'm trying to decide what API should be on my Rust wrapper object, and how much clunkiness it should reflect.

The options I'm currently considering are (feel free to suggest others):

  1. Write an iter_keystone method the way Rust usually does iterators, and simply have it hold a mutable reference to the object instead of a shared one. This seems counterintuitive compared to expectations on iterators an their function names, and highly discouraged.
  2. Write an iter_keystone_mut function the way Rust usually does iterators, and simply not have an iter_keystone function. There would be one comment buried in the rustdocs explaining why, but it just feels... weird to do that.
  3. Write a standard mutable iterator API like option 2, but name the method something different, like iterate_keystones_from_disk. This would be a hint it has complex internals... but is that hint really necessary, and worth typing that long ugly string?
  4. Expose the C API very directly, with methods keystone_iter_init and keystone_iter_next on my ComplexStructure wrapper type, adding some affordances like making next return an Option<KeystoneWrapper> instead of having a separate accessor function. But this just seems to be pushing the clunkiness off on the library's users, which seems impolite.
  5. Completely abandon the iterator idea, and have a read_keystone(index) method be the only way to read them at all. But Rust iterators just so useful, this would really be another way of pushing the problem off onto the library's users -- the problem this time being making an iterator out of an accessor function.
  6. Create Arc-and-Mutex-based parallelism over the C API, where every access to a keystone (both iterators and object methods) locks a member of the wrapper object, moves the C iterator to the offset it wants, deep copies the data, and then releases it. This is the most effort for me, the most bug prone, and ignores the "heaviness" of keystones, but would give Rust users the most intuitive API.

Any suggestions?

3 Likes

My guess is that Rust API would look like this:

impl ComplexStructure {
 fn iter(&self) -> KeystoneIter {} // Iterator<Item=&Keystone>
 fn iter_mut(&mut self) -> KeystoneIterMut {} // Iterator<Item=&mut Keystone>
}

but it's a bit dodgy that structure->keystone_index exists and is mutable. Can you duplicate it for each iterator? If you can't then having only iter_mut() is still viable and safe.

You can't however implement Iterator directly on the ComplexStructure, since borrow checker won't allow a structure to iterate and mutate itself at the same time (let a = s.next(); let b = s.next(); (a,b) must be valid).

3 Likes

Actually, I was planning for my Rust type to look something like this:

use c_api::ComplexStructure as CStructure; // just an extern type

struct ComplexStructure {
    ptr: *mut CStructure;
}

impl Drop for ComplexStructure { /* free() the ptr */ }

impl ComplexStructure {
    pub fn new(args: ComplexArgument) -> ComplexStructure {
        let mut complex = ComplexStructure {
            ptr: c_api::complex_structure_new()
        };
        c_api::complex_structure_init(ptr, args.as_ptr());
        complex
    }
    // ... all methods call C functions on self.ptr ...
}

Thus, my question: what parts of the C API do I use in my wrapper, and how Rust-like do I make it?

Or are you implying I should impl on the C type directly?

EDIT: for clarity.

ANOTHER EDIT: I now realize you partially answered my question. Thanks for that hint.

1 Like

It's fine if you implement the usual C API methods on the ComplexStructure wrapper. You can also implement many traits such as Clone, Debug, or AsRef. The exception is the Iterator trait, which in practice needs to be on a separate object, in order to have its iteration state mutated separately from the object it iterates.

You can't implement Rust methods directly on the CStructure type, as it'd imply it's a Rust native struct that can be allocated, copied and moved in Rust, but it can't, since C would cause crash if it called free() on Rust's pointer.

1 Like

Yeah, I figured I might a separate object for the iterators themselves.

But if I read your answer correctly, it's not "bad form" to:

  1. Have an iter_keystones_mut method on ComplexStructure that returns an iterator over keystones from a mutable reference, but
  2. Not have an iter_keystones method that returns an iterator from a shared reference.

That's really what I was asking. Because if it was too weird, I'd have to do something like this to create an immutable iterator that was thread- and concurrency-safe:

struct ComplexStructure {
    ptr: Mutex<*mut CStructure>
}

// skipping impl on ComplexStructure

struct KeystoneIter<'a> {
    over: &'a ComplexStructure,
    tracked_index: usize,
}

impl<'a> Iterator for KeyStoneIter<'a> {
    type Item = KeyStoneRef; // wrapping *mut CKeystone
    fn next(&mut self) -> Option<KeyStoneRef> {
        let ptr_guard = self.over.ptr.lock().unwrap(); // should panic on poison
        let cks = c_api::complex_keystone_get(*ptr_guard, self.tracking_index as c_int);
        if cks.is_null() {
            None
        } else { 
            self.tracking_index += 1;
            Some(KeyStoneRef { ptr: c_api::keystone_deep_copy(cks) })
        }
    }
}

Note the deep copy in there, because the region in C the pointer points to could be reused out-from-under Rust.

That's the deep copy C tried to avoid, relying on mmap instead. And I suspect putting that into the Rust wrapper would be a little unexpected -- in addition to having to make every other access by every other method lock a mutex.

So I'd call this solved then. Thanks!

1 Like

I've usually seen just iter_mut() rather than iter_foo_mut(). I guess the latter is OK if you can iterate in multiple ways.

It's fine not to have non-mut iter.

See also IntoIterator trait -- it allows working with the for loop. That would require into_iter() method, so maybe you could have yet another temp object for complex_structure.keystones().into_iter().

2 Likes

complex.keystones().iter() is what just occurred to me while I was fixing up an edge case or two. I'm glad to see I'm thinking the right way about Rust API design! :slight_smile:

Thanks again.