UB Questions! What exactly is an "Allocated Object"?

jasongrlicky · May 18, 2020, 4:05pm

I've been on a quest lately to reduce reliance on undefined behavior (UB) in the Rust dependencies I use. This led me to the following mysterious line in the the documentation for pointer.offset():

Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object. Note that in Rust, every (stack-allocated) variable is considered a separate allocated object.

I haven't heard this notion of an "allocated object" in any material on Rust before, so this unleashed a torrent of questions for me immediately. For instance:

How is memory on the heap divided into "objects" - is it per call to malloc, or something else?
How does the notion of object boundaries work for elements in an array? Is each element an "object", or is the whole array?
Is it only allocations from Rust that are counted as "allocated objects", or does the rule apply to allocations from C as well?
If so, what is considered the object boundary for C structs with a flexible-array-member?
Is there somewhere I can look to tell if object boundaries are being crossed when offsetting pointers - for instance, inspecting generated MIR, LLVM IR, or memory during debugging?

The motivation for this investigation comes from this issue on the coremidi_sys crate.

In this particular case, I need to do some math on a pointer that I'm handed from a C callback, complicated by the fact that the pointed-to struct is has a flexible array member (so the pointer offset will need to exceed the size of the struct that Rust knows about in some cases ). My instinct is to just cast the pointer to an integer like the pointer.wrapping_offset() docs suggest and do the math there, just to be sure that UB is avoided. But I'd much rather understand if that is really necessary before lowering the abstraction level of the code.

So that turned out to be a million questions, and I don't expect anyone to answer them all. But if anyone has any insights to share or resources that could be used to learn more about this, that would be much appreciated

(also posted on Reddit)

alice · May 18, 2020, 4:16pm

When you have a raw pointer, what it can access depends on how you originally created the raw pointer, with casts and offsets not changing that region.

If created from a reference, it can access exactly the things that the reference could reach.
If you got it from the allocator, it can access that and exactly that (heap) allocation.

This is more or less enough to deduce the rest of the rules. E.g. for stack objects, you always get the raw pointer by first creating a reference, which limits the raw pointer to that specific value on the stack.

Pointers in C should be considered the same as raw pointers. For pointers created in C, consult the C specification (i.e. if it isn't UB to access that address in C with the pointer, it's also OK in Rust) .

This means that if you get a pointer to a field by doing &value.field, then you got it from a reference, limiting the pointer to just that field. Similarly if you got it with offset and casting, then the pointer is still valid the for the rest of the fields.

Regarding mutation, it can mutate something if and only if the raw pointer originally came from a mutable reference or the allocator.

Note that converting a raw pointer into a reference and then converting the reference back can indeed result in a smaller region of validity.

H2CO3 · May 18, 2020, 4:37pm

I believe this should be a "logical", rather than "physical" distinction. For example, a couple of months ago, a question about unsafe code clarified that it is undefined behavior to do the following:

let vec: Vec<u32> = vec![1, 2, 3, 4];
let first_ptr = &vec[0] as *const u32;
let last_ptr = first_ptr.offset(3);
let last = unsafe { *last_ptr };

Because even though first_ptr physically points to the beginning of the vector, it was created from a reference to a single element, and thus it's illegal to use it for accessing other elements of the vector. Yet, the vector is required to allocate its buffer as a single contiguous call to memory, so as a consequence, I think "allocated object" can't possibly refer to "per call to the allocator" – it would be insufficient.

alice · May 18, 2020, 4:39pm

Yes, this is exactly the thing I was trying to highlight with

If created from a reference, it can access exactly the things that the reference could reach.

and

Note that converting a raw pointer into a reference and then converting the reference back can indeed result in a smaller region of validity.

jasongrlicky · May 18, 2020, 9:08pm

Hey @alice, thank you so much for the excellent breakdown of the rules! And thank you @H2CO3 for the clarification. This was super helpful

Alxandr · May 19, 2020, 7:32am

Small follow up here. I've been using MaybeUninit.first_ptr_mut lately in one of my projects (on an an owned Vec<MaybeUninit<u8>>), and after learning about these rules I got worried that my pointer arithmetic I was doing on that pointer to get at the nth element could be UB, but the documentation does not tell me wether or not the pointer is valid for the entire region covered by the slice. Checking the source of the function (luckily this is possible) do however reveal that a pointer is created to the entire slice, then just cased to a pointer for a specific item, so I assume this means it's ok to do things like ptr.add(5) to get at the 6th element. However, should maybe the documentation say that this is the case?

alice · May 19, 2020, 8:06am

Yeah that is a bit ambiguous, but I think it is fine to assume that functions like that which returns a raw pointer are valid for the entire array.

Yandros · May 19, 2020, 10:35am

The key to distinguish &buf[0] as * {const|mut} _ from buf.as_...ptr() is to think about the empty slice case: the former goes out of bounds and thus cannot succeed (either panicking or UBing), whereas the latter is well defined (an allocation can span across 0 bytes and thus have an associated ptr).

Given that the documentation of MaybeUninit.first_ptr_mut() does not mention a panic or unsafety when used on an empty slice, then it cannot go through a single-element reference (although technically it could subslice the slice to reduce the provenance, by special-casing non-empty slices, so the documentation still needs to be improved in that regard to guarantee lack of so doing; if this bothers you, in the meantime you can do .as_mut_ptr().cast::<T>() which has non-ambiguous semantics).

system · August 17, 2020, 10:35am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Understanding safety limitions of slice::from_raw_parts help	4	210	April 5, 2024
What is the difference between ptr::offset() and ptr::wrapping_offset()	12	2031	March 7, 2022
Trying to figure out if this is a UB help	1	320	January 12, 2023
Cannot move out of ... which is behind a raw pointer help	13	941	March 24, 2023
Simple question to official Rust book help	3	216	December 29, 2023

UB Questions! What exactly is an "Allocated Object"?

Related Topics