Conditions for the occurrence of a data race

The documentation says:

A data race is similar to a race condition and happens when these three behaviors occur:

  • Two or more pointers access the same data at the same time.
  • At least one of the pointers is being used to write to the data.
  • There’s no mechanism being used to synchronize access to the data.

Do I understand correctly that for a data race to occur, all three conditions must be met simultaneously?

Yes.

2 Likes

What then is a "race condition". I cannot find how that is defined in that documentation.

Data race is a very specific concept and always UB. Race condition is a more broad term that applies to any kind of bug where multiple threads are involved.

5 Likes

But what exactly? I first heard the terms "Data race" and "race condition" when I was involved in digital hardware design. Since then of course in the software world where threads are involved. So what exactly is Rust's definition of these things?

A data race is when two access to the same memory location happen concurrently, where at least one operation is a write, and at least one operation is non-atomic.

1 Like

And the "race condition" ?

Two threads tring to lock the same Mutex for example. Depending on your program semantics it can be acceptable, or a bug.

Even when data access is synchronized between multiple threads (i.e. not a data race) the outcome of your algorithm may change depending on which thread reaches the write or read instruction of some given place first. This condition is called a race condition because the outcome depends on which thread "wins the race" to the specific read or write instruction.

1 Like

Unlike data race, "race condition" is not a formal, precise term. It's an informal term for bugs that only happen sometimes because they depend on how threads get interleaved with each other.

4 Likes

I created a race condition here

use std::sync::atomic::{AtomicU8, Ordering};
use std::sync::Arc;
use std::thread;
use std::time::Duration;

fn main() {
    for _ in 0..200 {
        let a = Arc::new(AtomicU8::new(0));
        let b = Arc::clone(&a);

        let handle = thread::spawn(move || {
            thread::sleep(Duration::from_nanos(1));
            let old = b.fetch_add(1, Ordering::SeqCst);
            if old == 0 {
                println!("Hello from thread 1.");
            }
        });

        thread::sleep(Duration::from_nanos(100000));
        let old = a.fetch_add(1, Ordering::SeqCst);
        if old == 0 {
            println!("Hello from thread 0.");
        }

        handle.join().unwrap();
    }
}

Playground

This will sometimes print "Hello from thread 0." and sometimes "Hello from thread 1.", depending on which thread gets to the value behind a first.
You can see, that there is however no data race because I synchronized the access to the value behind a through an AtomicU8.

Wikipedia puts it as:

A race condition or race hazard is the condition of an electronics, software, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events, leading to unexpected or inconsistent results. It becomes a bug when one or more of the possible behaviors is undesirable.

A typical example of a race condition could be for example a missed update, e. g. if you implement an increment of a number by doing it in multiple steps

  • read current value
  • calculate value plus one
  • write back result

Such increments can be missed if two threads do them simultaneously. (Both threads read 42, then both threads write back 43, even though the correct result after two increments should be 44).

You can implement this "badly" (but not unsoundly, so no "data race" and no UB) in Rust, too. E. g. you could use atomics, but without using the atomic increment but just atomic read followed by atomic write. Or you could use Mutex<i32> but lock the mutex twice, once for the read (then unlock and calculate) then again for the write.

Of course "good" implementations on the other hand would do this either with atomics which can turn the 3 steps into a single one, or with a mutex that's kept locked throughout all 3 steps (or with some kind of loop that checks the value hasn't changed before doing the write operation, e. g. with compare-and-swap operation of atomics, and retries otherwise).


Many "memory safe" programming languages that aren't Rust avoid the UB from data races by turning them all into "just race conditions"[1]. Of course these race conditions are capable of leaving your not properly synchronized data structures in very invalid states (in case they care about invariants between multiple parts of the data structures beind upheld) so it let's you quickly get into "almost arbitrary program mi's-behavior but not UB" land. But it's probably the best they can achieve if they don't have Rust's Send/Sync system and aren't purely functional ("no mutation allowed") or avoid shared-memory ("only message passing").

Of course this is not to say that message passing prevents all race conditions; it just prevents data races, and makes bad cases of race conditions less easy to run into.


Anyways, as for 'what is a race condition' I would classify this example as "program behavior that depends on thread interleaving in ways it isn't supposed to producing results it isn't supposed to". In general, race conditions don't have to be about threads though, they could be about processes that interact, or even different servers that communicate, or a user interacting with a gui mutating data while some background thread is also mutating it, etc... usually "race condition" would refer to something that's a bug from some sort of - let's say "concurrent interaction"? - and often to bugs that are a bit more unlikely to actually occur. (Which makes them harder to observe, reproduce, etc.)

Edit: The Wikipedia linked above claims it doesn't have to refer to a bug. I guess that's a fair usage of the term, too, if you want to make it have that meaning. Cases where you don't care about different outcomes tbst depend on thread interleavings or the like, certainly exist.


  1. or occasionally also "just correct behavior" if your program logic allowed it and all you needed was an atomic read and/or write ↩︎

It's the best you can achieve period.

Language development goes in circles around Rice theorem.

Since you couldn't write a compiler which accepts 100% of “good” programs and rejects 100% of “bad” programs usually you go for “99% solution”: most “bad” programs are rejected, most “good” programs are accepted… and then, because that 1% is not detected, over time it becomes 90% solution… and then rules and habits are changed again.

Latest achievement is Rust which made data races impossible… and then immediately turned around and added async with it's cancellation issues.

Yes, but your example also fits exactly the definition of a "data race" as given in the docs. Does it not?

I'm afraid it is as I thought, the terms "data race" and "race condition" are used interchangeably to refer to the same things.

Anyway, I think I have had a revelation. I can make a "race condition" with no threads (or async tasks) at all. Here is a contorted example:

enum Operation {
    Read,
    Calculate,
    Write,
}

struct Operator {
    data: u32,
    temp: u32,
}

impl Operator {
    fn new() -> Self {
        Operator { data: 0, temp: 0 }
    }

    pub fn operate(&mut self, operation: Operation) {
        match (operation) {
            Operation::Read => {
                self.temp = self.data;
            }
            Operation::Calculate => {
                self.temp += 1;
            }
            Operation::Write => {
                self.data = self.temp;
            }
        }
    }

    fn get(&self) -> u32 {
        self.data
    }
}

use rand::rngs::ThreadRng;
use rand::Rng;

struct Machine {
    state: Operation,
    rng: ThreadRng,
}

impl Machine {
    fn new() -> Machine {
        Machine {
            state: Operation::Read,
            rng: rand::thread_rng(),
        }
    }
    fn run(&mut self, operator: &mut Operator) {
        if self.rng.gen::<bool>() {
            match self.state {
                Operation::Read => {
                    operator.operate(Operation::Read);
                    self.state = Operation::Calculate;
                }
                Operation::Calculate => {
                    operator.operate(Operation::Calculate);
                    self.state = Operation::Write;
                }
                Operation::Write => {
                    operator.operate(Operation::Write);
                    self.state = Operation::Read;
                }
            }
        }
    }
}

fn main() {
    let mut operator = Operator::new();
    let mut operation1 = Operation::Read;
    let mut operation2 = Operation::Read;

    let mut machine1 = Machine::new();
    let mut machine2 = Machine::new();
    for _ in 1..10 {
        machine1.run(&mut operator);
        machine2.run(&mut operator);
        print!("{}", operator.get());
    }
    println!("");
}

Which produces different results depending on each run depending on which Machine makes progress when.

I would insist that "data race", at least in the context of Rust, has a clear meaning.

In the example I gave, the access to the data is synchronized (if it's implemented as I hinted: I've listed the options of using either atomic or mutex). No read operation happens at the same time as another write without synchronization that makes them instead happen one after the other, after all. Of course the program logic has it that the whole increment should best be handled as if it was a single uninterruptable operation, but as far as the definition of what a true "data race" is, program logic doesn't matter and it really just talks about individual reads and writes.

"Data race" is a term that comes up at a relatively low level in the semantics of the language Rust itself (because it's defined to be UB). Thus also miri can detect data races (deterministically and surprisingly reliably), it could be a fun exercise to write some code example by abusing unsafe to create some truly unsynchronized concurrent read/write access in order to be able to observe how miri can then tell you that it detected the data race.


As far as "race condition" goes, I think your example could indeed well qualify, too. A single thread that models concurrency is no different from the real deal.

I agree. The meaning in Rust is spelled out in the opening post here. And it fits with what I understand from many other places.

My only niggle is that the definition says "similar to a race condition" but I don't see "race condition" defined anywhere in the docs.

A google finds a definition like "A race condition, on the other hand, is a broader term that describes any situation where the behavior of a program depends on the relative timing or ordering of events, such as the execution of threads."

Where I take "such as the execution of threads" to mean "not necessarily the execution of threads". So my example above is a "race condition" even though it has no threads it depends on the non-deterministic ordering of events.

1 Like

I have experienced a bug at work that was driven by what appeared to be a race condition in single-threaded code. It had to do with callbacks that ran on a timer that was powered by floating point timestamps - depending on the how the floating point math ended up rounding the timestamps, one callback would run before the other. So it's not just a thought experiment.

2 Likes

Sounds like a nice bug :slight_smile:

No doubt that timer call back was ultimately triggered by an interrupt. To my mind code that handles interrupts is essentially the same as code that runs in some other thread. After all OS threads are ultimately scheduled to run as a result of some interrupt buried down in the OS internals. Interrupts allow the creation of data races the same as threads.

A classic example of a race condition that is not a data race is the TOCTOU race family of security problems. Basically, every privileged program that does some variation of the following...

  • Read some security-critical information that may change (e.g. file permission, target of a symlink, user resource usage vs quota, program which a certain PID is an instance of)
  • Decide if some privileged operation is safe based on this information
  • Do the privileged operation assuming the previously read information is still valid

...is vulnerable to an attack where the attacker rapidly changes the security-critical information between the validity check and the actual operation.

See also the Therac-25 radiotherapy machine story for a variation where there was no harmful intent from the operator's side, but the consequences were just as dramatic (at least one death was clearly linked to the incident, multiple people got very severe radiation burns).