Iterators with mixed fixed/mutable state

Hi,

I'm new to Rust, working on my first project, a crate for Bayesian linear regression. I have two main structs:

  • SuffStats holds sufficient statistics. For given data, this is fixed throughout the modeling process.
  • Fit holds the inferred weights and some other information. For efficiency, this needs to mutate in place.

Originally I had these bundled together, but it seemed like it might be better to split them up like this. In some cases, it could be convenient to have "many readers" for a given SuffStats, but "one writer" for a Fit.

Fitting the model is done iteratively, currently by a method

impl Fit {
    fn update(&mut self, stats: &SuffStats) -> Option<&Self> {
        ...
    }
}

This works well, but it would be more convenient if there's a way to use Rust's Iterator trait. I've tried a few ways of setting up a single struct to hold both a SuffStats and a Fit, but I haven't yet gotten things to work out. It's fine if I clone, but that has all kinds of overhead I'd like to avoid. Without that, I keep getting into all kinds of lifetime problems, and haven't been able to make it work out yet.

Looking around for the right way to do this, I came across streaming iterators, which says

The iterator APIs in the Rust standard library do not allow elements to be yielded which borrow from the iterator itself. That means, for example, that the std::io::Lines iterator must allocate a new String for each line rather than reusing an internal buffer. The StreamingIterator trait instead provides access to elements being iterated over only by reference rather than by value.

Does this seem like a better approach for my application? As a newbie I don't yet have a good sense of idiomatic Rust, so I'd appreciate any design advice.

As you've noted, you can't implement Iterator — unless you decide to clone each item as you produce it, which you've also already rejected.

Does this seem like a better approach for my application? As a newbie I don't yet have a good sense of idiomatic Rust, so I'd appreciate any design advice.

Advice: Don't over-abstract. Don't try to implement any iteration trait — just write the code you actually need. Then, later, when you know more about your program and more about Rust, you will be in a better position to think about what abstractions you should use to make your code clean and flexible. The answer may still be to leave it alone.

2 Likes

Thanks for your response. Since this is a learning exercise, the code I actually need is something that will help me ramp up on Rust's abstractions. I don't want to end up with something silly and pointless, but at this point some extra work to gain some minor convenience is probably worth it.

For scientific applications, it's pretty common for iterative methods to mutate in place, so I'd like to get a handle on an idiomatic way to express that in Rust. Is streaming_iterator a good way to go, or is there something else I should be looking into?

Just remember to keep track of whether you're actually gaining convenience.

I can't answer that question as I have not yet have a need for streaming iterators, so I haven't used any libraries for that. (Though now that I think about it, maybe I do have a use case; there's a thing I wrote where the Iterator::Items are uncomfortably large and yet also don't have as much information as they should…)

1 Like

A better term these days for the pattern in question is "lending iterators", to distinguish from async iterators.[1] Let's just ignore all the adapters and make a higher-level comparison first:

trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}

// This is more general than the `streaming-iterator` versions 
// which return shared or `mut` references specifically
trait LendingIterator {
    type Item<'a> where Self: 'a;
    fn next<'a>(&'a mut self) -> Option<Self::Item<'a>>;
    // Same as:
    // fn next(&mut self) -> Option<Self::Item<'_>>;
}

In the first case, there is the returned Item type must be the same no matter how long (or short) the lifetime on &mut self is. That means it can't contain a borrow of self, and that these patterns must be able to compile:

// Calls to `next` don't invalidate items
let one = iter.next();
let two = iter.next();
make_use_of((one, two));

// All items can outlast the iterator (see also Iterator::collect)
let item = iter.next();
drop(iter);
make_use_of(item);

Whereas the second case supports cases such as returning a reference into the guts of the iterator:

impl LendingIterator for Something {
    type Item<'a> = &'a str where Self: 'a;
    // The `&str` can be borrowed from `self`
    fn next(&mut self) -> Option<&str> { todo!() }
}

But this means the patterns above won't work, because each returned item becomes invalid upon the next exclusive use of self, which includes both calling next again and destructing or moving it. Therefore you can only examine one item at a time and none of them can outlast the iterator.


The lending iterator pattern may be an approach for your use case...[2]

struct Fiterator<'f> {
    fit: &'f mut Fit,
    stats: &'f SuffStats,
}

impl<'f> LendingIterator for Fiterator<'f> {
    type Item<'a> = &'a Fit where Self: 'a;
    // &'a Fit<'f> -> Option<&'a Fit>
    fn next(&mut self) -> Option<&Fit> {
        self.fit.update(self.stats)
    }
}

// ...

    while let Some(fit) = fiter.next() {
        fit.do_stuff();
    }

...but since there is no language integration with e.g. for, it may not be a huge difference or even less obvious on the surface...

    while let Some(fit) = fit.update(stats) {
        fit.do_stuff()
    }

...and IMO it might also be a bit odd if you're modifying the iterator.

    let mut fiter = Fiterator { fit, stats: s1 };
    while let Some(fit) = fiter.next() {
        if fit as *const Fit as usize % 100 > 50 {
            fiter.stats = s2;
        }
    }

The main way I'm thinking of where you could gain convenience is via the adaptors from provided methods like take or find. If next and looping are the only things you need, I don't think I'd pull in a dependency and probably wouldn't bother with a home-spun trait either.


  1. The "streaming iterator" terminology existed before async landed. ↩︎

  2. the picture around SuffStats isn't entirely clear to me ↩︎

2 Likes

This is really helpful, thank you!

Ok, I think I need to get a better understanding of the lifetime syntax. It's not obvious to me how these are different. As I understand, there are always lifetimes for the compiler to reason about, same as with types. But like types, there are cases where the compiler lets you leave them out.

I really wish something in VS Code could help by desugaring the cases with lifetime elision like it can with types, to make it easier to see what's really going on. In this case

What's the equivalent way to write this with explicit lifetimes?

The point is that self is a reference, therefore it has an (implicit) lifetime parameter, whereas Item doesn't. This in turn means that self and Item are independent. The fully desugared declaration of next would be:

fn next<'a>(&'a mut self) -> Option<Self::Item>;

and the parameter 'a doesn't appear anywhere in Self::Item.

I haven't seen the desugaring algorithm spelled out, but from this and some other things I've seen it seems something like

  1. Give a different lifetime to every reference or "lifetime slot" like Item<'a>
  2. If there's exactly one input and exactly two lifetimes (one in the input and one in the output), make these the same.

Is that right? I think that would explain @quinedot 's "Same as:" comment here:

https://doc.rust-lang.org/reference/lifetime-elision.html

  1. Give a different lifetime to every reference or "lifetime slot" like Item<'a>

Almost: only parameters get this treatment. The return type (output) never does.

  1. If there's exactly one input and exactly two lifetimes (one in the input and one in the output), make these the same.

The rule is “If there is exactly one lifetime used in the parameters (elided or not), that lifetime is assigned to all elided output lifetimes.” So, there can be any number of elided lifetimes in the output and they either all get the single input lifetime, or it's an error if there is more than one input lifetime.

1 Like

It's also in the book.

1 Like

No; the self is treated specially so that return types get its lifetime parameter unless explicitly changed. This means that the following signature:

fn takes_self(&self, another_ref: &str) -> &str;

is desugared to this:

fn takes_self<'a, 'b>(&'a self, another_ref: &'b str) -> &'a str;
1 Like

Thanks all, this is really helpful. I've only been using Rust for a few weeks so far, and there's a lot to try to spin up on. I think I had seen that in The Book, but at the time it didn't relate to anything I was doing, so I didn't really internalize it.