Is the pointer arithmetic on a pointer that does not point to an element of an array undefined behavior?

Ok, so this is the only requirement for pointer arithmetic in Rust?

The requirements are in the documentation, which was posted above. But there's nothing related to provenance.

Thanks. What does "one byte past the end of the same allocated object." mean?

Does it mean

let i:i32 = 0;
unsafe{
   let ptr_i = &i as * const i32;
   let ptr_end = i.add(1); // OK
   let one_byte_from_end = (ptr_end as * const u8).add(1); // #1 OK ??
};

ptr_end points to the end of the object i. Does it mean #1 is still ok, which is one byte past the end of the object i?

Miri doesn't think so. The byte that ptr_end is addressing is already one byte past the end, so I think that's what they meant.

An i32 object should be diagramed as:

---------- <--------- start
|   byte |
----------
|   byte |
----------
|   byte |
----------
|   byte |
---------- <----------- end
// byte
//--------- <------------ the offset to end is one byte

Think of your typical C pointer loop...

for (ptr = start; ptr < end; ++ptr) { /* ... */ }
start                       end
|                           |
V                           v
+---+---+---+---+---+---+---+---+---+---+---+---+
| "allocation"              | something else    |
+---+---+---+---+---+---+---+---+---+---+---+---+
  ^                           ^
  |                           |
  what you read with *start   what you would (but better not) read with
                              *end (or *ptr after the loop)

It's ok to point to (but not read) at the position immediately following the memory in question.

A Rust slice iterator works analogously.[1]


  1. last I looked; implementation detail ↩︎

3 Likes

Yes it is what they mean.
I think it stems from c++ style iterators that function by giving you a start() and end() pointer. For example for an array a of length 10 they give you a.start() == &a[0] and a.end() == &a[10]. You can see that a.end() is 1 byte past the allocation of a. This is useful for iterating in the following fashion

for (auto i=a.start(); i!=a.end(); ++i {
    ...
}

As you can see the only thing you need a.end() for is for comparing it to the running pointer i, so you never need to dereference it which in fact would be UB.

1 Like

I think the issue is that in most of the Rust documentation, pointers and references refer to ranges, which are conceptually pairs of boundaries between actual bytes.

I think it would be better to write it mathematically:

For a single allocated object with starting address p and size s, both the starting and resulting pointer must be >= p and <= p + s.

Or consistently talk about pointers addressing bytes:

Both the starting and resulting pointer must be addressing a byte within, or the byte immediately after, the same allocated object.

1 Like

Well, this point is similar to C++, however, I just think the wording is a bit misleading, which is clarified by @drewtato.

The operative question here is whether rust has "subobject provenance". This is still undecided, but Miri implements two models ("Stacked Borrows" and "Tree Borrows") and SB has subobject provenance and TB does not. Subobject provenance means that when you construct a reference i.e. &o.a, the provenance provided by this reference gives access only to the bytes of the subobject (i.e. the four bytes of the u32) and not other things next to it. So accessing o.b through this reference would be UB under SB. Without subobject provenance, references to subobjects inherit the same access capabilities as their parents, so &o.a can be used to access any part of the allocation containing o.a, which in this case is the o stack allocation. In this case accessing o.b is defined behavior. (TB is the newer model, and removing this sometimes surprising source of UB was one of the reasons for this modification to the model. Miri still uses SB by default, which is why you see the described behavior. Pass -Zmiri-tree-borrows to Miri to get it to use TB rules instead.)

Note that in both cases, you can still avoid the subobject slicing behavior if you don't use references. If you use let ptr_o = std::ptr::addr_of!(o.a) instead then it will be defined behavior in both models. This is why one of the more common suggestions for unsafe code authors is to use raw pointers consistently, because mixing in references is a surefire way to accidentally make aliasing and validity assertions you weren't prepared to commit to. It's just a bit unfortunate that we lost the chance to have &raw syntax and need this macro instead.

3 Likes

For SB or TB, which could be more possible to be adopted as the formal model by Rust? TB is more consistent with intuition, I think.

One is not a subset of the other (each allows and disallows things the other does not). If I had to guess, any adapted model will be some third circle in the Venn diagram, picking and choosing between the two (or even sometimes something distinct from either).

I also suspect any model will be adapted gradually, piece-by-piece, not wholesale.

3 Likes

changing let ptr_o = &o.a as * const i32; to let ptr_o = std::ptr::addr_of!(o.a); still has UB, see Rust Playground

You need to first get a raw pointer to the whole object, and then use addr_of to get a raw pointer to the field. When you use addr_of directly to the field, it is different.

#[repr(C)]
struct A {
    a: i32,
    b: i32,
}
fn main() {
    let o = A { a: 0, b: 0 };
    unsafe {
        let ptr_o = std::ptr::addr_of!(o);
        let ptr_o = std::ptr::addr_of!((*ptr_o).a);
        let ptr_b = ptr_o.add(1); // #1
        let _b = *ptr_b;
    };
}
1 Like

However, this will have no difference from acquiring the pointer by using a reference. As said by @digama0, The purpose of using std::ptr::addr_of! to acquire the pointer from o.a is to avoid UB

No, @alice is right, if you use addr_of!(o.a) directly then under the more-conservative models you will still only get a pointer with access to o.a. Note that the compiler treats it like a borrow of o.a for the purpose of borrow checking:

#[repr(C)]
struct A {
    a: i32,
    b: i32,
}
fn main() {
    let mut o = A { a: 0, b: 0 };
    unsafe {
        let p = &mut o.b;
        let ptr_o_a = std::ptr::addr_of!(o.a); // ok
        let ptr_o = std::ptr::addr_of!(o); // double borrow error
        p;
    }
}
1 Like

I tried to understand what does "allocated object" mean here, std::ptr - Rust says:

For several operations, such as offset or field projections (expr.field), the notion of an “allocated object” becomes relevant. An allocated object is a contiguous region of memory. Common examples of allocated objects include stack-allocated variables (each variable is a separate allocated object), heap allocations (each allocation created by the global allocator is a separate allocated object), and static variables.

It seems the complete object is the allocated object here. In the origin question, o is that allocated object, because the contiguous memory is occupied by o. So, based on this understanding, ptr_o.add(1) does not violate the requirements imposed by add

  • Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object.

So, if does not dereference *ptr_b, the miri does not report UB. However, I didn't find the document about the UB with *ptr_b.

I don't think the details are decided on yet (as discussed above). Until they are, I recommend a conservative approach.

3 Likes

I am also confused with the discription here, it is too unclear. Will further explanation be added?

More things do tend to get decided over time; here's one example. In general for questions on this level, pay attention to the things with a T-opsem label; there are some more links here.

2 Likes