When can usize or u64 overflow?

When I deal with variables which I initialize with 0 and which I only ever increment and decrement by one, is the following reasoning reasonable:

  • u64 will in practice never overflow (but can underflow, of course),
  • usize will never overflow if I use it for counting some values that exist somewhere in the program.

For example, consider I have:

use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering::Relaxed};

pub struct A {
    child_count: Arc<AtomicUsize>,
}

pub struct B {
    sibling_count: Arc<AtomicUsize>,
}

impl A {
    pub fn new() -> Self {
        Self {
            child_count: Arc::new(AtomicUsize::new(0)),
        }
    }
    pub fn spawn(&self) -> B {
        let sibling_count = self.child_count.clone();
        sibling_count.fetch_add(1, Relaxed);
        B { sibling_count }
    }
    pub fn child_count(&self) -> usize {
        self.child_count.load(Relaxed)
    }
}

impl Drop for B {
    fn drop(&mut self) {
        self.sibling_count.fetch_sub(1, Relaxed);
    }
}

#[test]
fn test() {
    let a = A::new();
    let c1 = a.spawn();
    assert_eq!(a.child_count(), 1);
    drop(c1);
    assert_eq!(a.child_count(), 0);
}

(Playground)

Can I be sure that the usize never overflows? I think values of type B don't need to be stack allocated, so I'm not sure if there is some way where there could be more Bs existing than usize::MAX?

I believe you forgot the possibility of mem::forget to skip the B destructor. Thus, at least of soundness reasons, you can not safely assume the usize doesn’t overflow. Perhaps it’s reasonable in terms of program logic though, in case it’s ensured that an overflow would not cause any unsoundness.

3 Likes

And if I assume that program logic ensures that the destructor is never skipped? Is it then ensured the usize can't overflow?

I assume that it’s not actually documented and thus probably not guaranteed, but the current implementation of Arc uses a usize itself for the ref count, and aborts the program in case of overflow. I suppose that in this situation, (ab)using mem::forget to make the child_count overflow would make the Arc’s refcount overflow first; unless I’m missing something.

Actually, I don't think this can be an issue in practice, since the child_count is always less than the strong_count() of the Arc, which itself is limited to usize::MAX. Indeed, it would also work for this program to use the strong_count() of an Arc<()> to store its child_count.

I would believe so, yes.

u64 can overflow: example.

5 Likes

... because the optimizer cheats and computes that at compile time, using the closed form solution to those sums. The output has no loop at all!

playground::main:
	subq	$72, %rsp
	movabsq	$3875820019684212736, %rax
	movq	%rax, (%rsp)
	movq	%rsp, %rax
	movq	%rax, 8(%rsp)
	movq	core::fmt::num::imp::<impl core::fmt::Display for u64>::fmt@GOTPCREL(%rip), %rax
	movq	%rax, 16(%rsp)
	leaq	.L__unnamed_2(%rip), %rax
	movq	%rax, 24(%rsp)
	movq	$2, 32(%rsp)
	movq	$0, 40(%rsp)
	leaq	8(%rsp), %rax
	movq	%rax, 56(%rsp)
	movq	$1, 64(%rsp)
	leaq	24(%rsp), %rdi
	callq	*std::io::stdio::_print@GOTPCREL(%rip)
	addq	$72, %rsp
	retq
3 Likes

This isn't quite true, in two ways:

  • If the values are transient, then especially on a 16-bit system you can definitely overflow it.
  • If the values are ZSTs then it can definitely overflow. (See, for example, the panic from [[(); 10]; usize::MAX/2].flatten().)

Now, you're probably not in those situations, but as some others here have mentioned, the optimizer can sometimes collapse the "count one by one" updates -- probably even for atomics with relatively weak orderings -- so you probably never want to count on this for soundness.

Remember that LLVM knows that panicking/aborting paths are unlikely, and will optimize and layout the generated code accordingly. So I'd suggest just having the "if the wrapping addition overflowed to zero, abort the program" check.

LLVM and your branch predictor will both know that it's essentially never taken, so it'll have almost no cost. It'll probably only be one extra instructions, since jumping for zero is easy, and you don't even need a test since the flag got set by the addition you just did.

TL/DR: practically no it won't, but check anyway because it's cheap and getting UB if you made a mistake somewhere is really not what you want.

6 Likes

I'll point out that Arc relies on this for soundness -- it's incrementing a relaxed atomic. I think it would require launching isize::MAX threads to break it, but in theory you could launch that many threads and if they don't do anything other than incrementing the counter, the compiler is allowed to optimize it all out, leading to use after free?

Arc has the check I mention, though: https://github.com/rust-lang/rust/blob/a102dc806da3bc9c59b3594368a14e7d2632bf9c/library/alloc/src/sync.rs#L1367-L1379

In fact, it's a much stricter one since it's checking for any negative count.

That check would usually protect you, but it may fail to protect you in the case I mention with more than isize::MAX threads, in which case you may still get use after free. It even says so in the comments:

// This check is not 100% water-proof
// [...]
// However, that
// requires the counter to grow from `isize::MAX` to `usize::MAX` between the increment
// above and the `abort` below, which seems exceedingly unlikely.

I agree with the internal comment that there is a theoretical situation where you could free the Arc allocation and UAF before hitting the abort. However, I will go beyond the documentation and say that actually hitting this case is impossible in practice.

The reason for asserting hitting this case is impossible is that you need isize::MAX threads to be active in order to hit the usize overflow. On an actual machine, actually creating a thread means some amount of work and state; even if we're generous and say that's one byte of stack space and one byte of system overhead, that's still usize::MAX bytes, which even if you're running a 16-bit application on a 64-bit host, probably isn't going to work. (and this doesn't even have space for the bytes for the fetch_added counter.)

The optimizer optimizing threads out as no-ops isn't sufficient, either. For the optimizer to optimize a thread's clone out, it has to prove that the abort branch isn't taken (and that it also drops). In order to trigger the potential UAF, we logically have to order isize::MAX increments before the UAF before the isize::MAX calls to abort.

Contrary to some opinion, the optimizer isn't out to try to manufacture any possible UB in your program in order to delete it. Plus, it's likely (though not guaranteed) that creating a thread is considered observable behavior by the optimizer (e.g. do you have a way to get the thread ID?), so you'd still have to make the threads before overflowing the counter. Even if threads are removable, it seems fundamentally much easier to either prove the thread a no-op (removing the clone/drop pair entirely, removing an increment) or that it always aborts than to note that one potential execution synchronization of isize::MAX + 1 parallel threads causes UB, invalidating the program which does this.

This is fundamentally a probabilistic argument, but it's more than good enough for government work, probably even safety-critical government work.


HOWEVER, I do want to reinforce that if the lack of overflow is important to soundness, you should check for isize overflow, not usize overflow, because it's a lot less impractical to manufacture a case where a single thread doesn't hit the abort "soon" enough.

3 Likes

This argument sounds like a lot like the whole history of people claiming that some UB is fine where it's not because surely the optimizer "is not out to manufacture UB" deliberately or that the compiler "is not smart enough to prove X" or that "surely we know what happens on overflow on x86" etc. Maybe it's impossible in today's compilers, but what if compiler technology improves (especially with regards to multithreading)?

Now I agree this is not dangerous for 99.9999% of practical programs because nobody's creating billions of threads and cloning Arcs all over the place. My only argument is that this seems to be a hole in the safety guarantees of the language / standard library, not that it is an actual risk to your regular code.

It doesn't have to be a clone / drop pair. It can be a clone / mem::forget pair.

Well, this makes it not a no-op and not removable; in that it must maintain the clone operation (importantly, including the overflow check).

Yes, this is a probabilistic argument, rather than a fully sound one. But at the same time, Rust isn't pretending to be a safe sandbox, and such a program is incredibly degenerate (to the point of being useless) already.

The standard library of any language at some level is required to make assumptions about what the compiler does. (In an extreme example, in C++, it's declared UB to define any symbols in the std namespace. Yet the standard library shipped with the compiler has to do so in order to, well, provide std.)

It's certainly unfortunate that we have[1] to rely on compiler/optimizer implementation details to provide a sound API, but the point of the argument I sketched is to say that it's a reasonable enough assumption that it won't cause problems.


  1. Well, it's possible to do a compare-and-swap loop that checks for overflow, and that would completely eliminate the potential UB, but it would also be a significant performance penalty. ↩︎

LLVM doesn't collapse even Relaxed. See also c++ - Why don't compilers merge redundant std::atomic writes? - Stack Overflow.

@chrefr Yes, but that link also clearly says it could.

I agree that maybe loop-hoisting it would be surprising for the progress bar example they mention, but in same same basic block I don't see why they shouldn't just start combining updates -- let p1 = arc.clone(); let p2 = arc.clone(); being able to only do one update instead of two would be such an obvious win, for example.

if(x) {
    foo();
    y.store(0);
} else {
    bar();
    y.store(0);  // release a lock before a long-running loop
    for() {...} // loop contains no atomics or volatiles
}
// A compiler can merge the stores into a y.store(0) here.

... huh. That example is honestly kind of frightening, and it applies to Rust as well; nothing under optimization actually guarantees that dropping e.g. a MutexGuard will actually unlock before doing unrelated, potentially long-running work.

It's perhaps marginally better in Rust's case -- the C++ standard contains a request for atomic stores to be visible "in a finite amount of time," which should prohibit reordering past an infinite loop, which is itself allowed in Rust but not in C++.

Perhaps preventing this is what people are trying to express with an "acquire write"? (Since the simplistic explanation for the acquire ordering is that operations cannot be reordered before an acquire read.)

While overflowing a usize by incrementing a counter on a 64-bit machine isn't going to happen, it is definitely a danger on machines with smaller pointer sizes. For example, triggering Arc's overflow detection is pretty easy on a 32-bit machine. The following code:

use std::sync::Arc;
fn main() {
    let arc = Arc::new(1);
    for _ in 0..=usize::MAX {
        std::mem::forget(arc.clone());
    }
    println!("{}", arc); //use-after-free if reached
}

compiles, and if run with the command

cargo run --release --target=i686-pc-windows-msvc

produces the following output after about 10 seconds on my machine:

error: process didn't exit successfully: `target\i686-pc-windows-msvc\release\arc_leak.exe` (exit code: 0xc000001d, STATUS_ILLEGAL_INSTRUCTION)

This is definitely not something I'd want to have to debug, but it's a lot better than trying to track down the use-after-free that would otherwise happen[1]. While this case is definitely a bit pathological given that holding that many Arcs in memory is impossible and leaking that many is improbable, you could very well still wind up overflowing 32-bit counters (potentially including usize), with long-running programs in other places. Notably, youtube learned this the hard way back in 2014 after Gangnam Style got one too many views..


  1. Note that the sigill isn't caused by undefined behavior, it's caused by the usage of LLVM's abort intrinsic, which on x86 machines is defined as a ud2, an instruction whose sole purpose is to be illegal to execute. ↩︎

I'm asking because of these checks here, and wondering if I could skip them:

synced.rcvr_count = synced.rcvr_count.checked_add(1).unwrap();
synced.sndr_count = synced.sndr_count.checked_add(1).unwrap();
synced.elst_count = synced.elst_count.checked_add(1).unwrap();

I think this is the most relevant argument why to not rely on the usize not overflowing. Unlike Box::leak, when calling std::mem::forget, the memory of the passed value gets released (though any additional heap allocations won't). In my case, the structs look basically like this:

pub struct Sender<T> {
    shared: Arc<Shared<T>>,
}

pub struct Enlister<T> {
    shared: Arc<Shared<T>>,
}

enum Slot {
    A,
    B,
}

pub struct Receiver<T> {
    shared: Arc<Shared<T>>,
    slot: Slot,
}

If I create and forget any of these, then no memory should be allocated over time, thus memory exhaustion wouldn't protect me from an overflow. However, the Arc counter should prohibit an overflow as it will reach <usize>::MAX first.

So I guess I could practically omit the overflow checks, but maybe it would be more clear to keep them (as they won't impose any real performance issues).