Closures/Iterators borrowing stuff for too long?

I'm doing advent of code and ran into an error with closures borrowing things for longer than (I think) is necessary.

Here's a more simplified example of the error I'm running into.

This code can be interpreted as "I want all of the elements in this iterator that are not in this set; Take them and insert them into the set.

use std::collections::HashSet;

fn main() {
    let mut my_set: HashSet<i32> = HashSet::default();
    my_set.insert(2);
    my_set.insert(3);

    [1, 2, 3, 4]
        .into_iter()
        .filter(|num| !my_set.contains(num))
        .for_each(|num| {
            my_set.insert(num);
        });
}

The error I get back from the compiler says that the filter closure is borrowing my_set immutably AND the closure passed to for_each is borrowing my_set mutably. This makes perfect sense: you cannot have a mutable reference AND an immutable reference at the same time.

But (I'm pretty sure) these references don't exist at the same time! I mean, the fact that we are iterating over some element in the for_each closure implies that we have ran (and returned from and passed) the filter closure.

maybe the answer? why is this unsafe tho??

Some thinking, reading, and gpt'ing has lead me to think this is because the iterator basically runs filter->for_each element-wise; In other words, we don't filter everything, then for_each everything. So the filter closure is still around while the for_each closure is running.

Is that the right way of thinking about this error? In my head, a closure borrowing something isn't really an issue if that closure isn't currently running.

my question is more: why does this make the compiler angry?
than it is: how can I write code that makes the compiler happy (just use a for loop instead of two different closures, or use for_each with an if statement instead of a filter)

Thanks so much for your time! Happy holidays!

1 Like

That's correct. If it didn’t work that way, there would have to be memory allocated to store the filter results temporarily before for_each receives them. Iterators (well, std's provided iterators) don't do that, generally.

That is not how it works. Closures are essentially equivalent to structs that contain whatever they capture (whether that's a value or a reference to it). So, they are subject to all the same rules that any reference that does exist in plain sight is subject to.

Also, even if closure borrows worked the way you imagine, the compiler cannot be certain that the closures aren't running at the same time. Maybe filter, for_each, num, and my_set collaborated somehow so that the call to insert() actually invokes the |num| !my_set.contains(num) closure you passed to filter(). You know that the data types involved aren't doing anything like that, but there's nothing in the language’s rules, that the compiler applies, that say they definitely can’t do that.

4 Likes

Ignoring the conceptual stuff for a moment, you can fix this by just removing the filter line. Sets can only contain one of each item, so if you insert another it is just ignored.

[1, 2, 3, 4]
    .into_iter()
    .for_each(|num| {
        my_set.insert(num);
    });

Another way to write this is with extend:

my_set.extend([1, 2, 3, 4]);
4 Likes

Thanks for the reply! The analogy of closures as structs is pretty intuitive and it makes sense why the compiler is complaining at me.

Would I be correct in saying:
Filter and for_each could be doing anything with those closures (including "passing them down"). The closure passed to for_each could capture both the closure passed to filter and the &HashSet. This means that the for_each closure owns (or owns something that owns) both a reference and a mutable reference to the same thing (super not cool, compiler go berserk).

do iterator adapters need to worry about this tho??

TLDR, the code is safe, but the compiler can't guarantee it. Or can it? To my (limited) knowledge, all of the adapters in std::iter run sequentially (filter, then map, then skip, then take, then sum, for example). Is there any case where my original example needs worry about this "potential double capture" issue?

In other words, is there any safe rust you can replace the comments with to make the compilers worries valid?

fn main() {
    // super evil setup code

    [1, 2, 3, 4]
        .into_iter()
        .filter(|num| {
            // super evil filter code
        })
        .for_each(|num| {
            // evil for_each code
        });
}

There's lots of examples that don't compile (like the code from my first post). Is there an example where the compilers worries are actually valid?

I'm not sure if expecting the compiler to understand the Rust std library is infeasible. The compiler doesn't know that for_each isn't using/abusing the closure passed to filter.

the solution (change to business major)

The obvious solution is to perform the "filter" check in the for_each closure (or just google the AOC answer).

Part of me wants to think up some theoretical way for the code from post 1 to be valid. If I had an Uncapturable trait on the closures passed to the iterator adaptors, would that be enough to guarantee mem-saftey? Or maybe it'd be smarter to make an Adaptor trait that tells the compiler "this struct is part of a (possibly chained) adaptor and all the closures will play nice please compile my code pretty please."

P.S those aren't actual language suggestions, I just think understanding a hypothetical solution will help me understand the problem more

Thank you so much for your time!

No, there is no possible code to substitute here because the actual types of the iterator functions involved do not let this conflict occur. I was proposing a hypothetical where even i32 was in on the evil plot.

To restate the point I was trying to make: you would need even more rules in the language defining some kind of type-system for control flow that would allow the compiler to conclude that the closure calls don't overlap. Rust doesn't have that, and it would be an entire new thing to invent. It's not as simple as “just say closures don't conflict except when called”, because “when called” is not part of the type system.

(In a sense, lifetimes of borrows are the small amount of control flow information that currently reaches the type system. They're not anywhere near sufficient to achieve this kind of analysis.)

Yes, for example:

  • the setup borrows an element from a HashSet
  • the filter reads that element
  • the for_each inserts into the set

This is very similar to what you had except the filter now borrows an element of the HashSet instead of the HashSet itself. In both cases for the borrow checker the filter closure is borrowing the HashSet. However now when you insert you may trigger a reallocation of the HashSet backint buffer, which will make the borrowed reference point to deallocated memory.

(Currently on my phone so it's a little hard to write a proper example with code)

1 Like

That makes sense too.
And it’s easy to see that the HashSet and a reference to one of its owned values would have the same lifetime.

The error makes a lot more sense with this example + how closures work under the hood + how iterator adaptors work

Thank you guys for your answers!

Nit: the borrow of the HashSet in order to get the reference to one of its owned values must be as long as the lifetime on the reference.

(Rust lifetimes - those '_ things - are generally about the duration of borrows. Despite the name, the liveness scope of a value or variable (e.g. from creation to destruction) is not not a Rust lifetime.[1] So the HashSet itself doesn't have a (Rust) lifetime, and borrows of the HashSet can be much shorter than the liveness of the HashSet.[2] Calling the Rust things "lifetimes" is an unfortunate overlap of terminology.)


  1. though being borrowed is incompatible with being destructed, so there is a relationship. ↩︎

  2. else you couldn't take multiple exclusive &mut references to insert elements, say ↩︎

2 Likes

The control flow opacity issue with closures comes up frequently. Even in cases that "seem" like they should work. The example in my mind is when you want to use the entry API to extend a vector in a hashmap without cloning: Rust Playground

It seems like it should work because only one of the two closures will be called. There is no way to prove that, but the code can be fixed by draining the vector in one branch instead of moving into both: Rust Playground (And with the move-into-both resolved, or_insert() can be used instead of the useless move closure. [1])

There is at least a hypothetical possibility that with precise control flow analysis across functions, the compiler could use drop flags for the closures in both branches, like it does with normal control flow. There isn't a burning desire for such an expensive feature, though.


  1. It is interesting that this works at all! That's because the and_modify() closure is dropped when that method returns, releasing its borrows. If the closure was carried out through its return value, the lifetimes in its captures would as well. That would prevent it from compiling, avoiding mutable aliasing for some implementations of or_insert(). ↩︎

4 Likes