Understanding rayon with Atomics

I have the following code

use rayon::prelude::*;
use std::sync::atomic::{AtomicI32, Ordering}; // 1.6.0

const N_GRID: usize = 4;

type Grid = [[AtomicI32; N_GRID]; N_GRID];

fn main() {
    let mut grid: Grid = Default::default();
    fill_grid(&mut grid);
    println!("{grid:?}");
}

fn fill_grid(grid: &mut Grid) {
    (0..N_GRID).into_par_iter().for_each(|i| {
        grid[i][i].fetch_add(1, Ordering::SeqCst);
    })
}

(Playground)

Output:

[[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]]

I was wondering how grid in fill_grid(grid) can be &mut Grid but still be run in parallel by rayon. Doesn't that violate ownership?

And I know, grid could be &Grid because fetch_add() doesn't actually need a mutable reference, but that doesn't change that this code still compiles.

1 Like

The variable grid in the closure in fn fill_grid is only accessed immutably in the closure, hence the closure captures it by reference, effectively capturing &grid of type & &mut Grid. Well… that’s the old rules; the more fine-grained rules in edition 2021 also notice that only the target *grid is accessed (and only immutably), so actually &*grid of type &Grid is captured by the closure[1]. But either way (edition 2018 or 2021), the code will compile.


  1. this kind of reasoning is more important in cases where only some fields behind a reference are accessed; whereas here, the whole target of the references is; but it can nonetheless be an improvement because it removes the shorter lifetime, and allows code like this to compile in 2021 edition, though not in 2018. ↩︎

Is there a way to make this behaviour visible? Like using the wrong type annotation to make the actual type visible?

I was gonna say no, but I came up with a way that might work for many cases (nightly-only):

#![feature(negative_impls)]
#![feature(auto_traits)]

fn test(x: &mut [i32; 10]) {
    foo(|| {
        let a = x[3];
    })
}

fn foo(_: impl Foo) {}

auto trait Foo {}

impl<T: ?Sized> !Foo for &T {}
impl<T: ?Sized> !Foo for &mut T {}
error[E0277]: the trait bound `&[i32; 10]: Foo` is not satisfied in `[closure@src/lib.rs:5:9: 5:11]`
  --> src/lib.rs:5:9
   |
5  |       foo(|| {
   |       --- ^-
   |       |   |
   |  _____|___within this `[closure@src/lib.rs:5:9: 5:11]`
   | |     |
   | |     required by a bound introduced by this call
6  | |         let a = x[3];
7  | |     })
   | |_____^ within `[closure@src/lib.rs:5:9: 5:11]`, the trait `Foo` is not implemented for `&[i32; 10]`
   |
note: required because it's used within this closure
  --> src/lib.rs:5:9
   |
5  |     foo(|| {
   |         ^^
note: required by a bound in `foo`
  --> src/lib.rs:10:16
   |
10 | fn foo(_: impl Foo) {}
   |                ^^^ required by this bound in `foo`

Note the lines

within `[closure@src/lib.rs:5:9: 5:11]`, the trait `Foo` is not implemented for `&[i32; 10]`
note: required because it's used within this closure

which basically tells you that the closure captures &[i32; 10]. Different sets of negative impls (for the concrete types you think might be captured) can give you information about different non-reference types, too, and the error tends to keep being kinda clear about what the closure captures. E.g. using impl !Foo for [i32; 10] {} instead of the two negative impls above would result in

error[E0277]: the trait bound `[i32; 10]: Foo` is not satisfied in `[closure@src/lib.rs:5:9: 5:11]`
  --> src/lib.rs:5:9
   |
5  |       foo(|| {
   |       --- ^-
   |       |   |
   |  _____|___within this `[closure@src/lib.rs:5:9: 5:11]`
   | |     |
   | |     required by a bound introduced by this call
6  | |         let a = x[3];
7  | |     })
   | |_____^ within `[closure@src/lib.rs:5:9: 5:11]`, the trait `Foo` is not implemented for `[i32; 10]`
   |
   = note: required because it appears within the type `&[i32; 10]`
note: required because it's used within this closure
  --> src/lib.rs:5:9
   |
5  |     foo(|| {
   |         ^^
note: required by a bound in `foo`
  --> src/lib.rs:10:16
   |
10 | fn foo(_: impl Foo) {}
   |                ^^^ required by this bound in `foo`

where the lines

within `[closure@src/lib.rs:5:9: 5:11]`, the trait `Foo` is not implemented for `[i32; 10]`
   = note: required because it appears within the type `&[i32; 10]`
note: required because it's used within this closure

pretty much tell you that &[i32; 10] is captured. It also talks about [i32; 10], too though, so there is some care needed in correctly reading these error messages, I guess.


Maybe someone else even knows a more straightforward approach.

If there isn’t any more straightforward way, maybe we should add one? The compiler certainly has the necessary information, so a “feature” to deliberately create a compilation error that reports accurately the types captured by a particular closure seems potentially quite useful, especially for learning purposes.

1 Like

I was going to suggest (nightly-only) type ascription, but

In this specific case, you can force the use of a mutable borrow

        (&mut *grid)[i][i].fetch_add(1, Ordering::SeqCst);

which causes the expected error

error[E0596]: cannot borrow `*grid` as mutable, as it is a captured variable in a `Fn` closure
  --> src/main.rs:18:9
   |
16 | fn fill_grid(grid: &mut Grid) {
   |    ---------                  - change this to return `FnMut` instead of `Fn`
17 |     (0..N_GRID).into_par_iter().for_each(|i| {
   |                                          --- in this closure
18 |         (&mut *grid)[i][i].fetch_add(1, Ordering::SeqCst);
   |         ^^^^^^^^^^^^ cannot borrow as mutable

but there's no way I know of to interact with a place (let alone force a type error) without influencing auto(de)ref, thus influencing whether it's captured by-mut, by-ref, or by-move.


  1. The ICE seems to be because during MIR building we try to get (grid: _) as a place, which fails. Ascribing &Grid (or another incorrect type) gives "expected &mut Grid", so it wouldn't be useful anyway. ↩︎

As an aside, there's no reason to use such a strong ordering. You can just use relaxed:

grid[i][i].fetch_add(1, Ordering::Relaxed);

Additionally, since you don't ever have two threads access the same grid cell in parallel, you can even do this:

grid[i][i].store(1 + grid[i][i].load(Ordering::Relaxed), Ordering::Relaxed);

Ah yeah, this is not the actual example I am working with. There I am looping through particles assigning their masses onto a grid. So the access patterns to the grid are more or less random, depending on whether the particles are sorted by spacial coordinates or not.

Also I haven't really put a lot of effort into understanding the different memory consistency orders yet.

Thanks for the tip :slight_smile:

Also, thank you everybody for your explanations.

Aside: assuming that splitting the operation is acceptable, does splitting a fetch_add into load/add/store ever actually confer a benefit?

... checking assembly on the playground, it seems that on x86 this allows LLVM to use normal nonatomic access (no lockx86), so yes, this allows skipping the cache synchronization. Given the grid here is small enough that touched entries may be on the same cache line, this may even make an impact.

But perhaps if it's the case like here that the access is actually unique, it would be better to actually communicate that with &mut instead, e.g.

fn fill_grid(grid: &mut Grid) {
    grid.into_par_iter().enumerate().for_each(|(i, gridline)| {
        *gridline[i].get_mut() += 1;
    });
}
1 Like

The CAS-like instructions generally more expensive, yes.