Data race with compare_exchange

Why is there a data race?

It feels like compare_exchange is not executed atomically. Despite the fact that I tell parallel threads: "check if AtomicPtr has changed, it means that another thread has changed the data, so do the whole process again", it doesn't prevent other threads from changing the data randomly.

Playground

This looks more like a use-after-free; I mean, I haven't tried understanding the code much yet, but at least what miri says sound like use-after free; see this:

error: Undefined Behavior: out-of-bounds pointer use: alloc1092 has been freed, so this pointer is dangling
  --> src/main.rs:40:45
   |
40 |             let mut changed_data = unsafe { &*load_data }.clone();
   |                                             ^^^^^^^^^^^ out-of-bounds pointer use: alloc1092 has been freed, so this pointer is dangling
   |
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
help: alloc1092 was allocated here:
  --> src/main.rs:23:47
   |
23 |             ptr: AtomicPtr::new(Box::into_raw(Box::new(data))),
   |                                               ^^^^^^^^^^^^^^
help: alloc1092 was deallocated here:
  --> src/main.rs:50:49
   |
50 |                         Box::from_raw(load_data);
   |                                                 ^
   = note: BACKTRACE (of the first span) on thread `unnamed-2`:
   = note: inside `Rcu::<i32>::change::<{closure@src/main.rs:76:36: 76:42}>` at src/main.rs:40:45: 40:56
note: inside closure
  --> src/main.rs:76:25
   |
76 | /                         rcu.change(|data| {
77 | |                             *data += 1;
78 | |                         });
   | |__________________________^

unless that's just a consequence of your “data race”? (Do you mean “data race” or “race condition”, by the way?)


Edit 1: Nevermind, there also seems some actual data race going on; if I comment out the leakage prevention code in the Ok branch, I get

error: Undefined Behavior: Data race detected between (1) retag write on thread `unnamed-1` and (2) retag read of type `i32` on thread `unnamed-2` at alloc6614. (2) just happened here
  --> src/main.rs:40:45
   |
40 |             let mut changed_data = unsafe { &*load_data }.clone();
   |                                             ^^^^^^^^^^^ Data race detected between (1) retag write on thread `unnamed-1` and (2) retag read of type `i32` on thread `unnamed-2` at alloc6614. (2) just happened here
   |
help: and (1) occurred earlier here
  --> src/main.rs:42:27
   |
42 |             let new_ptr = Box::into_raw(Box::new(changed_data));
   |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   = help: retags occur on all (re)borrows and as well as when references are copied or moved
   = help: retags permit optimizations that insert speculative reads or writes
   = help: therefore from the perspective of data races, a retag has the same implications as a read or write
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
   = note: BACKTRACE (of the first span) on thread `unnamed-2`:
   = note: inside `Rcu::<i32>::change::<{closure@src/main.rs:76:36: 76:42}>` at src/main.rs:40:45: 40:56
note: inside closure
  --> src/main.rs:76:25
   |
76 | /                         rcu.change(|data| {
77 | |                             *data += 1;
78 | |                         });
   | |__________________________^

Edit 2: This seems unrelated to your main problem though. My tests so far suggest that this second issue might be related to some insufficient atomic::Ordering at some point; if I replace them all with SeqCst it seems to go away (which isn't to suggest that't the best course of action, not a 100% indicator that it really solved any problem). Similarly, without the use-after-free, your assertion also stopped failing, so maybe that was really only a consequence of that?

1 Like

Imagine 2 threads, A and B, calling change at the same time. What can happen is that:

  • thread A loads the pointer into load_data;
  • thread B loads the pointer, performs the change, replaces it with the updated pointer and then deallocates the old pointer;
  • thread A now dereferences load_data to clone it, but at this point it has already been deallocated.
1 Like

I have modelled the behaviour of 2 threads:

Looking at just the first thread, my initial thought was to do Box::from_raw(load_data); to remove the obsolete pointer.
In line:

let mut load_data = self.ptr.load(Acquire);

I create a lazy pointer that has no lifetime, so there should be a moment later where I explicitly delete it to avoid memory leaks.

What I have planned: In the loop I clone the T data, which I then make immortal and try to shift the shared pointer to it. If the pointer is not shifted (the second thread did it earlier between load and compare_exchange of the first thread), then I don't need this new data anymore, so I delete it:

Err(e) => {
    load_data = e;
    unsafe {
        // leakage prevention
        Box::from_raw(new_ptr);
    }
}

If I succeeded, we do the same (in a loop) with the pointer to the new data (which was produced by another thread and successfully pointed to it). I don't need the old pointer now, so I delete it:

Ok(load_data) => {
    unsafe {
        // leakage prevention
        Box::from_raw(load_data);
    }
    break;
}

But then I realised that the moment the old pointer was reset, the parallel thread, could have used it in the moment from ptr and change:

in line:

let mut changed_data = unsafe { &mut *load_data }.clone();

which could now indicate non-existent data.

Removed:

    unsafe {
        // leakage prevention
        Box::from_raw(load_data);
    }

the tests worked

But it's still a mystery to me: Why would resetting the pointer affect it?
If the reset went through:

Box::from_raw(load_data);

then it means that self.ptr.compare_exchange was successful and the pointer was shifted. And this in turn should mean that no matter what the parallel thread does with the old ptr, if it sees on its self.ptr.compare_exchange that the pointer has been changed by another thread, it will do it all over again (in a loop) with the new pointer. That is, one way or another it will have to achieve its goal with the latest actual data.
At the end of the day I don't understand why pointer resetting affects the fact that the pointer to inactive data is saved in the main shared pointer.

What if the allocator reuses an address:

  1. Thread A loads address X and reads value 0.
  2. Thread A allocates a new box with address Y and writes value 1.
  3. Thread B loads address X and reads value 0.
  4. Thread B allocates a new box with address Z and writes value 1.
  5. Thread B swaps the pointer from X to Z.
  6. Thread B frees address X.
  7. Thread C loads address Z and reads value 1.
  8. Thread C allocates a new box with address X and writes value 2.
  9. Thread C swaps the pointer from Z to X.
  10. Thread A swaps the pointer from address X to address Y.

The memory at address X is freed in step 6, so the allocator is able to reuse it in step 8. From the point of view of thread A, the pointer hasn’t changed, so its compare_exchange succeeds. Now three threads have incremented and swapped the pointer, but the value is only 1.

1 Like

Yes, you are right. I was trying to do the rcu for which it was stated:

After a successful update, other threads might still be reading the old copy, if they read the pointer before the update. You’ll have to wait for all those threads to be done before the old copy can be deallocated.

I guess that's what I didn't realise.

If the old ptr has been deallocated and the old thread accesses it (when cloning) then you already got UB.

More in general this is the problem of concurrent memory reclamation, which is not trivial at all.


You also got a data race due to loading the new pointer when compare_exhange fails with a relaxed operation. This will ensure you atomically read the pointer, but provides no happens-before relation with respect to the data pointed by that pointer, on which the data race then happens.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.