Is the pointer arithmetic on a pointer that does not point to an element of an array undefined behavior?

xmh0511 · January 17, 2024, 2:00am

I know such a pointer arithmetic in C++ is UB according to [expr.add] p4

When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.

Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n.

Otherwise, the behavior is undefined.

Consider this example:

#[repr(C)]
struct A{
    a:i32, 
    b:i32
}
fn main(){
   let o = A{a:0,b:0};
   let ptr_o = &o.a as * const i32;
   unsafe{
     let ptr_b = ptr_o.add(1);  // #1
     let _ = *ptr_b;
  };
}

Is #1 undefined behavior or well-defined in Rust? Does Rust have any restriction similar to C++ in pointer arithmetic?

quinedot · January 17, 2024, 2:12am

Read the documentation.

Safety

If any of the following conditions are violated, the result is Undefined Behavior:

Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object.

The computed offset, in bytes, cannot overflow an isize.

The offset being in bounds cannot rely on “wrapping around” the address space. That is, the infinite-precision sum must fit in a usize.

[...]

Consider using wrapping_add instead if these constraints are difficult to satisfy. The only advantage of this method is that it enables more aggressive compiler optimizations.

wrapping_add

Safety

This operation itself is always safe, but using the resulting pointer is not.

The resulting pointer “remembers” the allocated object that self points to; it must not be used to read or write other allocated objects.

[ ...]

Compared to offset, this method basically delays the requirement of staying within the same allocated object: offset is immediate Undefined Behavior when crossing object boundaries; wrapping_offset produces a pointer but still leads to Undefined Behavior if a pointer is dereferenced when it is out-of-bounds of the object it is attached to. offset can be optimized better and is thus preferable in performance-sensitive code.

The delayed check only considers the value of the pointer that was dereferenced, not the intermediate values used during the computation of the final result. For example, x.wrapping_offset(o).wrapping_offset(o.wrapping_neg()) is always the same as x. In other words, leaving the allocated object and then re-entering it later is permitted.

A note about your example.

This is a no-op because the wildcard binding (_) is special:

     let _ = *ptr_b;

What you presumably meant was

    let _perform_a_read = *ptr_b;

Run it in Miri (under Tools, top-right) and it will point out the UB.

xmh0511 · January 17, 2024, 2:25am

So, why this is UB? Is the reason similar to the reason of C++ I quoted?

jw013 · January 17, 2024, 2:26am

Miri will accept the code with the following modification:

let ptr_o = &o as *const A as *const i32;

Miri does not like it if you use &o.a to access *o.b so make sure to borrow all of &o first. I'm no UB expert but I think your toy example should be acceptable with that change.

quinedot · January 17, 2024, 2:43am

I'm no C++ UB expert, but probably. In Rust's case it's basically inheriting LLVM's GEP semantics. From here:

In particular, ptr::offset will cause us a lot of trouble, because it has the semantics of LLVM's GEP inbounds instruction. If you're fortunate enough to not have dealt with this instruction, here's the basic story with GEP: alias analysis, alias analysis, alias analysis. It's super important to an optimizing compiler to be able to reason about data dependencies and aliasing.

[ ...]

When you use GEP inbounds, you are specifically telling LLVM that the offsets you're about to do are within the bounds of a single "allocated" entity. The ultimate payoff being that LLVM can assume that if two pointers are known to point to two disjoint objects, all the offsets of those pointers are also known to not alias (because you won't just end up in some random place in memory). LLVM is heavily optimized to work with GEP offsets, and inbounds offsets are the best of all, so it's important that we use them as much as possible.

[...]

These cases are tricky because they come down to what LLVM means by "allocated". LLVM's notion of an allocation is significantly more abstract than how we usually use it. Because LLVM needs to work with different languages' semantics and custom allocators, it can't really intimately understand allocation. Instead, the main idea behind allocation is "doesn't overlap with other stuff". That is, heap allocations, stack allocations, and globals don't randomly overlap. Yep, it's about alias analysis.

xmh0511 · January 17, 2024, 3:25am

However, Rust is not based on TBAA, isn't it? Moreover, according to this rule

Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object.

Is ptr_b not within the same allocated object? Is the allocated object referred to o or o.a?

quinedot · January 17, 2024, 4:29am

Rust does not have TBAA. (Your fields have the same type anyway.)

There's at least two things at play as I understand it.

Boundaries of the allocated object
Provenance of the original reference (which cannot exceed the allocated object)

The provenance of your original reference only covers the field.

It's not inconceivable that some day in the future the language devs will decide that the pointer provenance is the entire allocated object, but until they do, you can't start with a reference to a field and soundly (e.g. via pointer arithmetic) read or mutate memory unreachable from the reference. Tree borrows has looser rules around pointer provenance for example (though I'm not sure if it covers this exact case or not).

I.e. Rust's exact aliasing model is still undecided, so for now, you have to take the conservative option to be sound.

drewtato · January 17, 2024, 7:20am

It seems like Miri is fine with add as long as it's inside o.

#[repr(C)]
struct A {
    a: i32,
    b: [i32; 10],
}

fn main() {
    let o = A { a: 0, b: [0; 10] };
    let ptr_o = &o.a as *const i32;
    for offset in 0..100 {
        println!("{offset}");
        unsafe {
            let ptr_b = ptr_o.add(offset); // #1
            let _slice = std::slice::from_raw_parts(ptr_b, 0);
        };
    }
}

Miri flags this after printing 12.

error: Undefined Behavior: out-of-bounds pointer arithmetic: alloc897 has size 44, so pointer to 48 bytes starting at offset 0 is out-of-bounds
  --> src/main.rs:13:25
   |
13 |             let ptr_b = ptr_o.add(offset); // #1
   |                         ^^^^^^^^^^^^^^^^^ out-of-bounds pointer arithmetic: alloc897 has size 44, so pointer to 48 bytes starting at offset 0 is out-of-bounds
   |
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
help: alloc897 was allocated here:
  --> src/main.rs:8:9
   |
8  |     let o = A { a: 0, b: [0; 10] };
   |         ^
   = note: BACKTRACE (of the first span):
   = note: inside `main` at src/main.rs:13:25: 13:42

If you change it to wrapping_add, Miri flags the from_raw_parts on the same iteration. I don't think you can soundly dereference the 1st to 11th pointers to anything except zero-sized types.

khimru · January 17, 2024, 7:57am

Allocated object is what pointer points to. You took address of o.a thus it's address of o.a. @jw013 took address of o and now object is o.

You can always go from large object to small one, but you can not “expand” object.

You have to remember that all UBs, ultimately, are a compromise between developer and a compiler^[1].

Definition used by C++ and Rust is important for optimizations: if compiler knows that p originates from o.a and q from o.b then it knows that access via pointer p doesn't touch memory that is accessed via q (you can make them equal if objects are adjacent in memory, but then access via one or another would be UB).

It's actually incredibly important thingie because without that something like memcpy couldn't be optimized and if memcpy couldn't be optimized then performance of any Rust program becomes so awful it's not even worth talking about.

But don't forget that Rust compiler is not the only compiler involved: superscalar CPU are hardware-implemented JITs and thus also rely on absence of UB). ↩︎

ZiCog · January 17, 2024, 8:24am

The assumption here is that .b follows .a in memory with no padding such that incrementing a pointer to .a gets you a pointer to .b.

This is not true.

The ordering does not have to be the same as the order in which the fields are specified in the declaration of the type.

From: Type layout - The Rust Reference

quinedot · January 17, 2024, 8:25am

They used #[repr(C)] for a deterministic layout.

ZiCog · January 17, 2024, 8:35am

Ah, sorry, missed that.

xmh0511 · January 17, 2024, 9:40am

Why do you say that? Isn't your code pointing out the first 11th add won't have UB?

jw013 · January 17, 2024, 3:41pm

Adding to a pointer and actually reading a nonzero number of bytes from it seem to be distinct questions as far as miri is concerned. What has been shown is that if you start with a pointer derived from &o.a, miri will let you add to it as long as you stay inside o, but will not let you read any bytes from outside o.a.

xmh0511 · January 18, 2024, 5:10am

Well, does it mean we are permitted to do pointer arithmetic on the pointer pointing to an object to point to a subobject of the object, which does not cause UB? It will be UB if we do it the other way around.

drewtato · January 18, 2024, 8:03am

For the idea of "allocated object", I don't think there's such a thing as a subobject. Memory is considered a flat container of allocated objects. In this case, o is an object of type A allocated on the stack. If this A was part of a bigger struct (directly, not through a pointer), it would also be valid to move the pointer into other members of the bigger struct.

That's only if Miri is correct, but I think it is.

ZiCog · January 18, 2024, 9:57am

This kind of code upsets me.

Yes a and b are of the same type and follow each other in memory and one would expect a incrementing pointer to a would yield a pointer to b.

But it is fragile. What if somebody changes b to a different type? Or removes it? Or puts something else in between?

You are basically using a pointer to one thing to get access to a different thing.

drewtato · January 18, 2024, 8:48pm

Getting access to b is still UB. Only the add is allowed, not the dereference. Not really sure when this would be useful since wrapping_add exists, but there's probably something.

It would only break if you add more than 1 (adding 1 should always be sound on any pointer that comes from a plain reference) and you remove or shrink b. But that's pretty standard for unsafe types.

xmh0511 · January 19, 2024, 1:47am

Using a pointer provenanced from a bigger object to point to a smaller object by pointer arithmetic and deferences the result pointer to access the smaller object does not cause UB, right?

drewtato · January 19, 2024, 1:54am

As long as the smaller object is part of the bigger object it should be fine.

Topic		Replies	Views
What is the difference between ptr::offset() and ptr::wrapping_offset()	12	2967	March 7, 2022
Does the reading of bytes of an object in Rust cause UB? help	52	1551	January 5, 2024
Why this `set_null` function is unsound? help	5	222	February 7, 2025
UB Questions! What exactly is an "Allocated Object"? help	8	1703	August 17, 2020
Safety of casting from `mut T` to `mut ()` to `*mut T` (+dynamic linking?) help	16	889	October 2, 2024

Is the pointer arithmetic on a pointer that does not point to an element of an array undefined behavior?

Safety

Related topics