Ok, so this is the only requirement for pointer arithmetic in Rust?
The requirements are in the documentation, which was posted above. But there's nothing related to provenance.
Thanks. What does "one byte past the end of the same allocated object." mean?
Does it mean
let i:i32 = 0;
unsafe{
let ptr_i = &i as * const i32;
let ptr_end = i.add(1); // OK
let one_byte_from_end = (ptr_end as * const u8).add(1); // #1 OK ??
};
ptr_end
points to the end of the object i
. Does it mean #1
is still ok, which is one byte past the end of the object i
?
Miri doesn't think so. The byte that ptr_end
is addressing is already one byte past the end, so I think that's what they meant.
An i32
object should be diagramed as:
---------- <--------- start
| byte |
----------
| byte |
----------
| byte |
----------
| byte |
---------- <----------- end
// byte
//--------- <------------ the offset to end is one byte
Think of your typical C pointer loop...
for (ptr = start; ptr < end; ++ptr) { /* ... */ }
start end
| |
V v
+---+---+---+---+---+---+---+---+---+---+---+---+
| "allocation" | something else |
+---+---+---+---+---+---+---+---+---+---+---+---+
^ ^
| |
what you read with *start what you would (but better not) read with
*end (or *ptr after the loop)
It's ok to point to (but not read) at the position immediately following the memory in question.
A Rust slice iterator works analogously.[1]
last I looked; implementation detail ↩︎
Yes it is what they mean.
I think it stems from c++ style iterators that function by giving you a start() and end() pointer. For example for an array a
of length 10 they give you a.start() == &a[0]
and a.end() == &a[10]
. You can see that a.end()
is 1 byte past the allocation of a
. This is useful for iterating in the following fashion
for (auto i=a.start(); i!=a.end(); ++i {
...
}
As you can see the only thing you need a.end()
for is for comparing it to the running pointer i
, so you never need to dereference it which in fact would be UB.
I think the issue is that in most of the Rust documentation, pointers and references refer to ranges, which are conceptually pairs of boundaries between actual bytes.
I think it would be better to write it mathematically:
For a single allocated object with starting address
p
and sizes
, both the starting and resulting pointer must be>= p
and<= p + s
.
Or consistently talk about pointers addressing bytes:
Both the starting and resulting pointer must be addressing a byte within, or the byte immediately after, the same allocated object.
Well, this point is similar to C++, however, I just think the wording is a bit misleading, which is clarified by @drewtato.
The operative question here is whether rust has "subobject provenance". This is still undecided, but Miri implements two models ("Stacked Borrows" and "Tree Borrows") and SB has subobject provenance and TB does not. Subobject provenance means that when you construct a reference i.e. &o.a
, the provenance provided by this reference gives access only to the bytes of the subobject (i.e. the four bytes of the u32
) and not other things next to it. So accessing o.b
through this reference would be UB under SB. Without subobject provenance, references to subobjects inherit the same access capabilities as their parents, so &o.a
can be used to access any part of the allocation containing o.a
, which in this case is the o
stack allocation. In this case accessing o.b
is defined behavior. (TB is the newer model, and removing this sometimes surprising source of UB was one of the reasons for this modification to the model. Miri still uses SB by default, which is why you see the described behavior. Pass -Zmiri-tree-borrows
to Miri to get it to use TB rules instead.)
Note that in both cases, you can still avoid the subobject slicing behavior if you don't use references. If you use let ptr_o = std::ptr::addr_of!(o.a)
instead then it will be defined behavior in both models. This is why one of the more common suggestions for unsafe code authors is to use raw pointers consistently, because mixing in references is a surefire way to accidentally make aliasing and validity assertions you weren't prepared to commit to. It's just a bit unfortunate that we lost the chance to have &raw
syntax and need this macro instead.
For SB or TB, which could be more possible to be adopted as the formal model by Rust? TB is more consistent with intuition, I think.
One is not a subset of the other (each allows and disallows things the other does not). If I had to guess, any adapted model will be some third circle in the Venn diagram, picking and choosing between the two (or even sometimes something distinct from either).
I also suspect any model will be adapted gradually, piece-by-piece, not wholesale.
changing let ptr_o = &o.a as * const i32;
to let ptr_o = std::ptr::addr_of!(o.a);
still has UB, see Rust Playground
You need to first get a raw pointer to the whole object, and then use addr_of
to get a raw pointer to the field. When you use addr_of
directly to the field, it is different.
#[repr(C)]
struct A {
a: i32,
b: i32,
}
fn main() {
let o = A { a: 0, b: 0 };
unsafe {
let ptr_o = std::ptr::addr_of!(o);
let ptr_o = std::ptr::addr_of!((*ptr_o).a);
let ptr_b = ptr_o.add(1); // #1
let _b = *ptr_b;
};
}
However, this will have no difference from acquiring the pointer by using a reference. As said by @digama0, The purpose of using std::ptr::addr_of!
to acquire the pointer from o.a
is to avoid UB
No, @alice is right, if you use addr_of!(o.a)
directly then under the more-conservative models you will still only get a pointer with access to o.a
. Note that the compiler treats it like a borrow of o.a
for the purpose of borrow checking:
#[repr(C)]
struct A {
a: i32,
b: i32,
}
fn main() {
let mut o = A { a: 0, b: 0 };
unsafe {
let p = &mut o.b;
let ptr_o_a = std::ptr::addr_of!(o.a); // ok
let ptr_o = std::ptr::addr_of!(o); // double borrow error
p;
}
}
I tried to understand what does "allocated object" mean here, std::ptr - Rust says:
For several operations, such as
offset
or field projections (expr.field
), the notion of an “allocated object” becomes relevant. An allocated object is a contiguous region of memory. Common examples of allocated objects include stack-allocated variables (each variable is a separate allocated object), heap allocations (each allocation created by the global allocator is a separate allocated object), andstatic
variables.
It seems the complete object is the allocated object here. In the origin question, o
is that allocated object, because the contiguous memory is occupied by o
. So, based on this understanding, ptr_o.add(1)
does not violate the requirements imposed by add
- Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object.
So, if does not dereference *ptr_b
, the miri
does not report UB. However, I didn't find the document about the UB with *ptr_b
.
I don't think the details are decided on yet (as discussed above). Until they are, I recommend a conservative approach.
I am also confused with the discription here, it is too unclear. Will further explanation be added?
More things do tend to get decided over time; here's one example. In general for questions on this level, pay attention to the things with a T-opsem label; there are some more links here.