Miri "in-bounds pointer arithmetic failed" when trying to do prefetch on x86-64

Something I have found in current BLAKE3 implementation:

error: Undefined Behavior: in-bounds pointer arithmetic failed: attempting to offset pointer by 256 bytes, but got alloc157824+0x340 which is only 192 bytes from the end of the allocation
   --> BLAKE3/src/rust_avx2.rs:265:22
    |
265 |         _mm_prefetch(inputs[i].add(block_offset + 256) as *const i8, _MM_HINT_T0);
    |                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Undefined Behavior occurred here
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
help: alloc157824 was allocated here:
   --> crates/shared/ab-merkle-tree/tests/balanced.rs:55:9
    |
55  |     let leaves = {
    |         ^^^^^^
    = note: BACKTRACE (of the first span) on thread `mt_balanced_32_`:
    = note: inside `blake3::avx2::transpose_msg_vecs` at BLAKE3/src/rust_avx2.rs:265:22: 265:55
    = note: inside `blake3::avx2::hash8` at BLAKE3/src/rust_avx2.rs:331:24: 331:69
    = note: inside `blake3::avx2::hash_many::<64>` at BLAKE3/src/rust_avx2.rs:402:9: 412:10
    = note: inside `blake3::platform::Platform::hash_many::<64>` at BLAKE3/src/platform.rs:258:17: 267:18

Essentially the code is trying to prefetch the next cache line, but I guess for performance reasons they don't do bounds check and apparently x86-64 CPUs are okay with this, but Miri says this is UB to even create such a pointer.

What would be the correct way to do this without introducing bounds check?

Use wrapping_add as advised in the add documentation.

1 Like

Using add() is indeed a UB; use wrapping_add() instead.

1 Like

Very interesting, but also a bit confusing. Turns out it will not actually wrap around despite the function name :thinking:

Fix: Fix prefetch pointer addition that resulted in UB by nazar-pc · Pull Request #507 · BLAKE3-team/BLAKE3 · GitHub

It is called wrapping because it is allowed to wrap around in address space. AFAIK, one important optimization which is enabled by add is that a < b is equivalent to a.add(x) < b.add(x). This is not the case for wrapping_add because b.wrapping_add(x) is allowed to overflow/wrap.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.