Rust scoped threads weird behaviour

use std::thread;

fn main() {
    let data = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
    let mut sum = 0;

    thread::scope(|scope| {
        for e in data {
            scope.spawn(move || {
                sum += e;
                println!("Sum: {}", sum);
            });
        }
    });

    // Printing the final sum after all threads have finished
    println!("Final Sum: {}", sum);
}

The behavior of this code is quite unexpected to me.
First of all, I feel like there can be multiple threads that access the sum variable introducing a data race.
Second of all, at the end of the function the sum is 0. Why is that?

I am definitely missing something here.

The full output of the code is

Sum: 1
Sum: 3
Sum: 5
Sum: 7
Sum: 4
Sum: 6
Sum: 8
Sum: 10
Sum: 9
Sum: 2
Final Sum: 0

You moved, by copy, sum into the thread closures.

Each threads owns its own copy of sum and adds e to that. But the original sum is unchanged.

7 Likes

Oh I see now. So basically move there mean copy because sum is an int type. I did not want to do that lol :slight_smile:

2 Likes

Yes. But nitpick, not because it's an integer type, but because that integer type implements Copy. Rust is pretty good at using the type system to represent properties of types.

(And I don't know if the bigger, 128bits and natural, numeric types will implement Copy?)

2 Likes
7 Likes

And to address this point: if you remove the move, so that each thread gets a reference to sum, the borrow checker will complain. The correct way to do this is with atomics:

use std::sync::atomic::{AtomicI32, Ordering};
use std::thread;

fn main() {
    let data = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
    let sum = AtomicI32::new(0);

    thread::scope(|scope| {
        for e in data {
            let sum = ∑
            scope.spawn(move || {
                sum.fetch_add(e, Ordering::Relaxed);
                println!("Sum: {}", sum.load(Ordering::Relaxed));
            });
        }
    });

    // Printing the final sum after all threads have finished
    println!("Final Sum: {}", sum.into_inner());
}

(playground)

5 Likes

Just out of curiosity: why did you expect the sum variable not to be moved, when you explicitly used a move closure? Perhaps understanding why you thought it would have no effect would allow us to help you with further, likely deeper, misunderstandings you may have.

2 Likes

I usually think of moving something in terms of heap-allocated objects. However, I forgot that for simple types that implement Copy the "move" is just a memcpy

I implemented a solution where I slice up the array into different slices that each thread computes the sum of. I do not use any atomics in the code and it works.

use std::{ops::Add, thread};
use rayon::prelude::*;
use std::iter::Sum;

fn compute_sum_in_parallel<T>(data: Vec<T>, nr_threads:usize) -> T where T: Send + Sync + std::ops::AddAssign + Add<Output = T> + for<'a> Sum<&'a T> + Default  {
    let chunk_size = data.len() / nr_threads;
    
    let mut total_sum = Default::default();

    thread::scope(|scope|{
        let mut handles: Vec<thread::ScopedJoinHandle<T>> = Vec::new();
        for i in 0..nr_threads {
            let start_index = i * chunk_size;
            let end_index = if i == nr_threads - 1 {
                data.len()
            } else {
                start_index + chunk_size
            };
            
            // get a slice
            let data_slice = &data[start_index..end_index];
            
            // Spawning a thread to compute the sum of the current slice
            let handle = scope.spawn( || {
                data_slice.iter().sum()
            });
            handles.push(handle);
        }

        for handle in handles {
            total_sum += handle.join().unwrap();
        }
    });

    total_sum
}

fn main() {
    let size = 100000001;
    let now = Instant::now();
    let sum:usize = compute_sum_in_parallel((1..size).collect::<Vec<_>>(), 16);

}

Is there any performance benefits of using atomics in this case as compared to my solution in terms of performance?

1 Like

First off, moving and heap-allocation are completely orthogonal. It's an apples to oranges comparison. Every value can be moved. Whether it is on the heap or the stack, static memory, or anywhere else. Whether or not it allocates a heap buffer (or any other resource) itself. Moving just means bitwise copying a value from one place to another, transferring its ownership to the new place.

Second, moving is also orthogonal to Copy. Copy and non-Copy types can equally be moved.

And then Copy is also orthogonal to heap-allocation. Integers are not Copy because they are on the stack! If you put a primitive integer on the heap or in a static item, it's still Copy – whether something can be bitwise copied is intrinsic to its type, and has nothing to do with any particular address it resides at.

2 Likes

Atomics will be slowers, since you'll have all the threads trying to mutate the same memory location.

2 Likes

In my experience of making this mistake, and seeing others make the mistake, the problem is that under non-Copy circumstances, if you move something you don't want to move, you would then get an error when you try to use it after the end of the scope — which is immediate feedback on the mistake. Here, because of the copy, there are now, silently, two mutable sum places. That’s usually not what you want, and it's the sort of thing that Rust usually makes you be more explicit about.

If I were trying to fix this problem, I'd try adding a warning whenever

  • a closure captures a binding foo by value,
  • foo is mutated within the closure,
  • the type of foo is Copy, and
  • foo is also used later outside the closure (would be an error if the type is not Copy),

and see how many false positives that got.

[Update: I posted a refined version of this idea as a comment on the issue.]

5 Likes

I agree with your statement. I was quite confused when I saw that there was a copy made even though I did not see anything like an assignment operator.

fn foo(x: i32) {}
let y = 42;
foo(y);

Here a copy of the value 42 is made, even though there is no assignment. Is that also confusing?

Moves are just a memcpy whether or not the type implements Copy. The difference is that if you don't implement Copy, the original variable/place becomes uninitialized afterwards.

The type system isn't heap-aware. Generally speaking, if you move something that manages a heap allocation, you're moving a pointer. E.g. a Vec<T> is a pointer, capacity, and length triple.

5 Likes

That is clear because I used other languages like C or C++ :slight_smile:
I understood now the idea, thank you all for your explanations

I think this is interesting.

I've previously though about the data pointed to by the Vec as being moved conceptually, and as a result, be slightly mystified by how exactly Rust does that without it being horrendously slow.

At some point I became more aware of the pointer being moved, while the heap allocated data doesn't.

And only recently am I becoming aware of what moving actually is at the system level. I.e. a memcpy.

It might help to know more about what a memcpy is exactly. I gather it's a low-level instruction to... the hardware(?)... to copy memory, but the devil is in the details! :slight_smile:

Memcpy is just the C standard library function for… copying memory. Conceptually it’s just a loop that copies bytes, but of course implementations may optimize it however they want. But the archetypal implementation is simply

unsafe fn memcpy(dest: *mut [u8], src: *const [u8], n: usize) {
  for i in 0..n {
    dest[i] = src[i];
  }
}

When people say "it’s just a memcpy", they mean that the value’s bytes are literally blindly copied from A to B – specifically, no user code is executed in the copy, in contrast to C++, where a user-defined copy or move constructor or assignment operator may be invoked and you can’t be sure how much work is actually done without checking. Rust just does what C does, except the shooting-yourself-in-the-foot part.

1 Like

There's a caveat here: in Rust, it's UB to read uninitialized memory, and values may contain uninitialized memory (padding, MaybeUninit, etc.). So, you can't actually write that loop in Rust (unlike C which, if I remember correctly, has an exemption for doing this) and use it in all the ways Rust moves do. (If you do need to do it explicitly in Rust, you can call std::ptr::copy, or more often, one of its narrower relatives, like std::ptr::write or the methods on pointer types themselves.)

3 Likes

Reading uninit memory is not UB.

Rather it is the act of producing an invalid value (from behaviour considered undefined), since the src[i] expression returns (produces) a value of type u8 containing uninitialized bytes. Had the pointers used the MaybeUninit<u8> type instead, then it would still be reading uninit memory but it would not be UB because the value would no longer be invalid for its type.

6 Likes