# A little compilation error puzzle/challenge (can you understand what is going on?) around iterators, closure captures, borrow checking, references, etc

I’ve formulated the question here:

I’m opening this thread so the discussion of this question does not take over the original thread

Here’s the code from the linked playground. It contains a good amount of irrelevant code on the side, as it was not designed for a quiz/challenge in the first place, but the question “what is going on here?” came up (for me) naturally while/after writing this modification of the other thread’s OP’s code.

``````use rand::Rng;
use rand::distributions::Uniform;

fn main() {
let n = 4; // number of threads
let len = 8; // length of vector

// a random vector of positive integer
let range = Uniform::new(0, 5);
let x: Vec<u64> = (0..len).map(|_| rng.sample(range)).collect();

let mut counts: Vec<Vec<usize>> = vec![vec![0_usize; len]; n];// a matrix of dim len x n
let chks = vec![[0,1],[2,3],[4,5],[6,7]];// 4 chunks for 4 threads

for (chk, count) in std::iter::zip(&chks, &mut counts) {
s.spawn(|| {
for &i in &*chk {
count[x[i] as usize] += 1;
}
});
}
});

println!("x = {:?}", x);
println!("counts = {:?}", counts);
}
``````

The question is, why does the code stop working if we write `for &i in chk` instead?

The “hint” I give above (also take into consideration how writing `for &i in <_>::into_iter(chk)` won't compile but `for &i in <&_>::into_iter(chk)` will) is honestly less of a hint and more of a sanity check as to whether you managed to think of all the details of what is going on here.

Some of the existing discussion which I’ve moved into this thread, will quote or elaborate on the solution, too. For a solution see this post below, which you can expand step by step, in case you want to get some more actual hints, partial explanations, or just read through the whole thing ^^

1. also take into consideration how writing `for &i in <_>::into_iter(chk)` won't compile but `for &i in <&_>::into_iter(chk)` will ↩︎

1 Like

Capture rules are no fun.

3 Likes

Let me give an actual answer to this quiz, as far as I understand the situation… (click once more to expand…)
1. The usage of `thread::scope` means that references may be contained in the spawned closure only if their lifetime is longer than the whole `scope`.
2. Hence borrows of `chks` or `counts` are fine… even if they are created with a few steps in-between, by creating a by-reference iterator, then moving the iterator’s item, a reference, into the closure.
3. If the iterator elements like `chk` (or `count`) are mentioned in the closure passed to `s.spawn`, then they do not require by-value access; references only need read access (if it’s an immutable reference; or mutable access if it’s a mutable one), never really owned access,
4. The straightforward way to resolve this would be to use a ‘`move |…|`’ closure. That would directly run into other problems though like ‘`use of moved value: `x``’ errors one would need to work around by defining another variable containing `&x` that then can be moved into the closure.
5. New closure features for more fine-grained capturing introduced with 2021 edition mean that `move` closures can often be avoided in this kind of situations.
6. With typical closure capture, and in this case also with the new one, the problem that arises in the case where `chk` was iterated over, not `&*chk`, is that `chk` ends up being captured in the closure by-reference, so the closure would be containing a `&&[usize; 2]` reference pointing to the short-lived local variable `chk` declared inside of the `thread::scope` which is disallowed.
7. For ‘`count`’, the similar kind of problem (capturing by mutable-reference, as a `&mut &mut _`) is avoided because for `count[…] += 1`, the compiler introduces an implicit re-borrow of `count`, and for a reborrow, i.e. ‘`&mut *count`’ only ‘`*count`’ is accessed!
8. Then the finer-grained closure capturing can kick in and captures only `*count` by-mutable-reference, so the closure does not contain a `&mut &mut Vec<usize>` but a `&mut Vec<usize>` re-borrow of `count`, and re-borrows, unlike borrows-of-borrows, are allowed to live up to as long as the original reference, even after the variable that reference lived in, like ‘`count`’, goes out of scope.
9. The same kind of argument could apply to ‘`chk`’ if it was implicitly re-borrowed. However, implicit re-borrows interact badly with type inferrence.
10. If it’s only known that a type is a reference-type after type inferrence, then passing the value to a function (like `Iterator::into_iter` in this case) will not result in an implicit re-borrow. This is also why `<_>::into_iter` and `<&_>::into_iter` behave differently, the example I gave in the “Hint” (which is less of a true hint and more like somewhat of a bonus question , or sanity check whether you managed to think of all the details necessary); with more explicit type information, an implicit re-borrow can happen, and then closure captures can know only `*chk` is accessed.
11. With explicit re-borrowing (by writing `&*chk` explicitly), the issue of needing to rely on an implicit re-borrow is avoided, because the explicit re-borrow makes it very apparent to the compiler that only `*chk` is accessed (immutably) in the closure, and hence it can be captured by-reference in the closure, and such a reference can live long enough, as discussed above for `count`.
12. I was previously mostly only aware of problems arising from re-borrows of mutable references not happening implicitly, causing compilation errors when they wouldn’t need to exist; this is the first time I see a bad consequence (compilation failure) of the same for immutable references, which are usually less problematic: The common case where mutable references cause problems when not re-borrowed is the problem that the reference ends up being moved; but immutable reference can be copied, so moving them is never a problem that causes any “used after moved” kind of errors.

And why does `for i in *chk {...}` instead of `for &i in &*chk {...}` work too? (As opposed to `for &i in chk {...}` which - as you explained above - doesn't work)

I mean superficially it's clear. You remove a reference level from the pattern and from the expression. So that should be fine.
But I guess the two variants are not actually the same thing, because I would think there is no reborrow in `*chk`. So is this just the captured-by-reference `&&[usize; 2]` pointing to the short lived `chk` which gets dereferenced into a `&[usize; 2]` pointing to the long lived `chks`? I.e. a similar mechanism as your approach but wihout reborrowing?

EDIT
After looking at it a bit longer in vscode now (instead of on the phone) I think what I just wrote is not correct.
If I do this

``````fn main() {
let n = 4; // number of threads
let len = 8; // length of vector

// a random vector of positive integer
let range = Uniform::new(0, 5);
let x: Vec<u64> = (0..len).map(|_| rng.sample(range)).collect();

let mut counts: Vec<Vec<usize>> = vec![vec![0_usize; len]; n];// a matrix of dim len x n
let chks = vec![[0,1],[2,3],[4,5],[6,7]];// 4 chunks for 4 threads

for (chk, count) in std::iter::zip(&chks, &mut counts) {
s.spawn(|| {
let a = *chk;
for i in a {
count[x[i] as usize] += 1;
}
});
}
});

println!("x = {:?}", x);
println!("counts = {:?}", counts);
}
``````

then `a` seems to have a type of `[usize; 2]`. So this works only because `[usize; 2]` is copy. If I replace `chks` with a `Vec<[String; 2]>` it gives me an error.

``````error[E0508]: cannot move out of type `[String; 2]`, a non-copy array
``````

So as I suspected, althoug both

``````for i in *chk {...}
``````

and

``````for &i in &*chk {...}
``````

work, they work for different reasons.

The `&i in &*chk` approach iterates through the chunk by-reference. The `i in *chk` approach dereferences the `[usize; 2]` array and then iterates by-value. I thought the by-reference approach generalizes better in case you want the size (2) to become dynamic, as something like `Vec<usize>` does not implement `Copy` so with such a type instead of `[usize; 2]` the `i in *chk` approach would no longer work. Even with arrays, a larger constant, say `[usize; 10000]`, that may be more realistic for chunk sizes that make sense to split into threads, a by-value approach has more overhead and involves moving large values on the stack, so the `&it in &*chk` approach seemed more appropriate to me

To re-iterate to be clear (and ignoring the closure capturing: `chk` is a `&[usize; 2]`. Then `for _ in chk` iterates by-reference via the `std::slice::Iter` type, returning items of type `&usize`, which the `&i` pattern can match and de-reference into `i` of type `usize`. (Alternatively with `for i in chk` one could dereference via `*i` in the loop body, too.) On the other hand, `for _ in *chk` creates `*chk` of type `[usize; 2]` copies the value into a temporary variable, then creates a new `std::array::IntoIter` iterator that holds the array by-value and produces owned `usize` values from it.

1 Like

Yeah, I stand by my previous answer

I am just wondering if the type `&&[usize; 2]` mentioned here is actually correct. I cannot observe it with the compiler. If I do

``````let a: i32 = chk;
``````

inside the closure I get

``````error[E0308]: mismatched types
--> src/main.rs:20:30
|
20 |                 let a: i32 = chk;
|                        ---   ^^^ expected `i32`, found `&[{integer}; 2]`
|                        |
|                        expected due to this
``````

Notice there is no double reference in the error message. Or is the let statement changing the way `chk` is captured?
Notice, if I do this

``````            s.spawn(|| {
let a = chk;
for &i in chk {
count[x[i] as usize] += 1;
}
});
``````

I still get the error

``````error[E0373]: closure may outlive the current function, but it borrows `chk`, which is owned by the current function
``````

So `chk` seems to be `&[usize; 2]` and not `&&[usize; 2]`.

You cannot easily observe the type of closure captures. Inside of the closure body, the variables will still have the same expected types, essentially desugaring to a dereference of the captured-by-reference field value in the closure under the hood. The closure captures are partially implementation detail, and partially observable via their effects on borrow checking, auto traits and the like. For example:

``````#![feature(negative_impls)]
#![feature(auto_traits)]

auto trait Foo {}

impl !Foo for usize {}

impl Foo for &&[usize; 2] {}

fn demo() {
let chk: &[usize; 2] = &[0, 0];

let closure1 = || {
for _ in chk {}
};
let closure2 = || {
for _ in &*chk {}
};
assert_foo(closure1); // compiles fine, so the closure *does* contain a &&[usize; 2]
// assert_foo(closure2); // doesn't compile, because the closure contains a &[usize; 2]
}
fn assert_foo(_: impl Foo) {}
``````

or without nightly

``````use std::cell::Cell;

fn demo() {
let chk: &[Cell<usize>; 2] = &[Cell::new(0), Cell::new(0)];

let closure1 = || {
for _ in chk {}
};
let closure2 = || {
for _ in &*chk {}
};
// uncomment one at a time, as both together will result in only one error for some reason
// assert_sync(closure1); // compiler error mentions &&[Cell<usize>; 2]
// assert_sync(closure2); // compiler error mentions only &[Cell<usize>; 2]
}
fn assert_sync(_: impl Sync) {}
``````

Oh wow. I didn't know this. Isn't that pretty bad that one needs to know about a compiler implementation detail in order to write the correct code here?

And do I understand this right, that it's not `chk` that has the type `&&[usize;2]` but that it's like a "field on the closure struct" that has type `&&[usize;2]`.

I wonder if it was such a good idea to make the captures implicit instead of having to explicitly define what happens with each capture.

I’ll partially blame compilation errors that could be a bit more helpful here, and partially refer to the fact that one needs not avoid the switch to a `move` closure, which is the compile suggestion here; the compiler doesn’t help with the subsequent step of also adding an appropriate thing such as e.g. `let x = &x;` before the `s.spawn(move || …)` call, but that solves the problem equally well (at the cost of more boilerplate in the general case where there may be a lot of variables like `x` that need to be manually not-`move`d into the `move` closure via usage of a helper variable.

That’s exactly the right way to think about it.

1 Like

Thanks a lot for answering in so much detail.