How to write an iterator of mutable reference?

I have seen some topics similar to this, but I think they're not much help.
I want to implement a feature that consumes an iterator and returns the corresponding value of the item of the iterator according to the map. But the following code doesn't compile. For the sake of brevity, I omitted some code and used concrete types, and assumed String cannot be cloned.

use std::{collections::HashMap, slice::Iter};

struct MyIter<'a> {
    keys: Iter<'a, u32>,
    map: &'a mut HashMap<u32, String>,
}

impl<'a> Iterator for MyIter<'a> {
    type Item = &'a String;
    fn next(&mut self) -> Option<Self::Item> {
        let key = self.keys.next()?;
        let string = match self.map.get(&key) {
            Some(s) => s,
            None => {
                todo!("Insert an empty string and return it");
            }
        };
        Some(string)
    }
}

Code on the playground

error: lifetime may not live long enough
  --> src/lib.rs:18:9
   |
8  | impl<'a> Iterator for MyIter<'a> {
   |      -- lifetime `'a` defined here
9  |     type Item = &'a String;
10 |     fn next(&mut self) -> Option<Self::Item> {
   |             - let's call the lifetime of this reference `'1`
...
18 |         Some(string)
   |         ^^^^^^^^^^^^ associated function was supposed to return data with lifetime `'a` but it is returning data with lifetime `'1`

I don't know why I am returning data with lifetime '1 but not 'a, and how to fix this? Is it safe if I return string forcibly via unsafe code like follow? Thanks.

// Change `Some(string)` to this
unsafe { Some(&*(string as *const String)) }

You’re probably running into the problem that &mut has an invariant lifetime, but I’m not sure exactly how. If you’re ok with not inserting into the map for missing keys, you can write something like this instead:

use std::{collections::HashMap};

struct MyIter<'a, I> {
    keys: I,
    map: &'a HashMap<u32, String>,
}

impl<'a, I:Iterator<Item=u32>> Iterator for MyIter<'a, I> {
    type Item = &'a str;
    fn next(&mut self) -> Option<Self::Item> {
        let key = self.keys.next()?;
        let string = match self.map.get(&key) {
            Some(s) => s.as_str(),
            None => &""
        };
        Some(string)
    }
}

Note that filling in your todo! will be impossible: Inserting a value into the HashMap may move other values, which can break references returned from previous calls to next— If you want the HashMap to be modified like this, you’ll need to use some other strategy to make it happen.

This also makes your unsafe workaround unsound: After the internal structure of the map is changed, some of the references will be left dangling.

3 Likes

Thanks for your detailed explaination :grinning:.

  • In order to get a &'a String out, one needs to call get with a &'a Self.
  • Given the struct, this is a reborrow of the &'a mut HashMap.
  • But that's trapped behind a &mut self with an arbitrary lifetime in next,
  • and you can only get a &'short mut from a &'short mut &'long mut.

If the last restriction wasn't in place, you could get aliasing &'long muts by round-tripping through a shorter-lived outer &'short mut.


Iter<'a, T> is Clone, so you could pre-insert the keys when constructing the shared-reference iterator.

If your actual keys isn't a cloneable iterator, and/or you don't want to insert keys unless they were iterated over, there are other approaches that still might work for you.

1 Like

Thanks for your explaination and workaround, which is sound and enlightening.
In your second demonstration, you choose return dummy instead of creating an empty string immediately. But actually, I forgot to metion, in my project, creating a string is not a thing simple as String::default but through a slightly time-consuming function, and the string changes as the key changes. For the sake of brevity, you can recognize creating a string is not by String::default but u32::to_string().
Do you have a workaround for this case? Thanks. :sweat_smile:

(Lots of thinking out loud ...)

Hmmm... one is tempted to use some sort of buffer, but the difficulty is that you can't actually mutate such a buffer on each next call -- that would be a lending iterator (where you can only hold on to each Item until the subsequent next, as opposed to be able to collect them all and access them all at the same time). Making a faux-iterator (a LendingIterator, but there is no such standard trait yet) instead of an Iterator is an option.

The pre-populate approach would work with Iterator, but you do pay the cost up-front.

You could perhaps (if the keys iterator is Clone) pre-scan to allocate enough uninitialized or Default storage, and then populate it as you go. If the values have a reasonable default, this could avoid unsafe; if not, you might need to use MaybeUninit or the like. POC.

If the keys iterator isn't clone... you could still pre-scan but collect into a Vec<Either<(key, (key, String))>> or something like that. Even sloppier POC.

But... you should probably keep track of what you consumed in this case, or you'll insert your dummy default/uninitialized values. Increasingly messy POC.

And an entirely different approach is to ditch iteration and use some sort of scoped callback instead.

Thanks again for giving such a complex workaround and detailed explanation, which solved my problem perfectly (although understanding them takes me some time). :grinning:

2 Likes

I looked at it again with a fresh mind and it has some flaws to be aware of:

  • duplicate missing keys aren't handled ideally
  • I had a bug where I counted missing keys but filtered the iterator in drop after take and not before

Here's a bug-fixed and better-commented version, but it still doesn't handle the duplicate new keys case ideally.

1 Like

Something like this:

/// A streaming iterator of mutable references, where `advance` consumes the iterator.
pub trait MutStreamingIterator: Sized {
    type Item;

    fn advance(self) -> Option<Self>;
    fn get(&mut self) -> Option<&mut Self::Item>;
    fn size_hint(&self) -> (usize, Option<usize>);
}

// Clone is not required for the iterator, but needed for the collect that takes by value
fn collect<T: Clone, I: MutStreamingIterator<Item=T>>(mut iter: I) -> Vec<T> {
    let mut values = Vec::with_capacity(iter.size_hint().0);
    
    loop {
        match iter.advance() {
            Some(mut new_iter) => {
                if let Some(item) = new_iter.get() {
                    // item is a mutable reference here!
                    values.push(item.clone())
                }
                iter = new_iter;
            },
            None => break
        }
    }

    values
}