I've submitted this on SO as well but maybe I'll be more lucky here.
I think I've kind of understood provenance (maybe not), but all the documentation I find talks about going from/to pointers and usize. But what about [u8; N] ? There's nothing in the documentation about what happens to pointers encoded/decoded to byte arrays using std::ptr::read()/std::ptr::write().
Why I'm doing that is beyond this question, but this is somewhat similar to what the Zngur C++ interop tool does under the hood.
From what I've understood about provenance, if the C API was returning a uintptr_t, aka usize, I would need to ensure the pointer obtained by it has provenance (using std::ptr::with_exposed_provenance() or std::ptr::with_addr(). But here there is no such usize.
So what I cannot understand is the following:
does the raw pointer inside the resulting MyStruct have provenance?
is it valid to dereference (assuming it was on the C side)?
if not, why?
what to do to fix the situation?
Another way to put this is: is what Zngur is doing sound?
No, because the variable data has a type for what it owns without indirection,[1] that type does not carry provenance, and read_unaligned() is not an operation that gives it provenance it does not previously have.
is it valid to dereference (assuming it was on the C side)?
No, because it does not have provenance.
what to do to fix the situation?
One of these options:
Create the pointer value using with_exposed_provenance() (or another operation that adds a suitable provenance).
When you own the data without indirection, it must be given a type that can carry provenance, or you must avoid owning the data without indirection.
Regarding usize versus byte arrays: usize has no special powers of its own; the special provenance-applying powers are in the operations that make a pointer from a usize.
Transmuting integers to pointers is a largely unspecified operation. It is likely not equivalent to an as cast. Doing non-zero-sized memory accesses with a pointer constructed this way is currently considered undefined behavior.
The code you’ve written essentially performs mem::transmute_copy, and I assume the same constraint applies to that as transmute. Therefore, your current code is very likely unsound; the resulting pointer isn’t guaranteed to have provenance.
I’ll assume there’s some reason that the C function can’t just return MyStruct (or that RawData can’t have a MyStruct field) even with repr(C, packed) to preserve the same alignment as what you have now.
Unless there’s some reason that wrapping [u8; _] in MaybeUninit doesn’t work, I’d recommend:
Thanks! But how can I use with_exposed_provenance() and friends if I don't have a usize to start with?
I mean, in the code above the thing is simple, I may declare a proxy struct with a usize instead of a pointer, read that, and then initialize MyStruct using with_exposed_provenance(). But in general MyStruct could have complex fields and the pointer being buried somewhere inside a field of a field etc.
So is this code doomed? And is what Zngur does unsound as well?
Can you point me to some docs explaining the relationship of MaybeUninit and provenance? Or can you elaborate on why using MaybeUninit there would make the code sound?
Also, since this struct is going through the C ABI, is MaybeUninit<T> guaranteed to have the same ABI as T? (edit: yes, I've looked up the documentation and it says so)
Well, come to think of it, C and C++ do also have pointer provenance (albeit not implemented in the same way as Rust, and poorly documented in their standard “it’s your fault if you mess up ” fashion), so you might need to be careful on that side as well for the code to be sound.
As for why MaybeUninit helps, see the “Validity” section (if you haven’t already read it, given your edit): MaybeUninit in std::mem - Rust.
You can convert your data to usize, e.g. using usize::from_ne_bytes(). But this is not really appropriate for your use case with a whole struct. I am only discussing usize because I want to make the point that usize does not have any special powers. The only thing that makes usize particularly important is that the exposed provenance operations take or produce usize (though the implicit ones in as casts are also willing to use isize or truncate to smaller integer types). But for your application, it is better to preserve the provenance as-is rather than exposing it, and for that you should take @robofinch’s advice and use [MaybeUninit<u8>; N].
Thanks, I've looked at MaybeUninit docs, and I'm not sure I understand exactly what is going on.
From the docs:
Moving or copying a value of type MaybeUninit<T> (i.e., performing a “typed copy”) will exactly preserve the contents, including the provenance, of all non-padding bytes of type T in the value’s representation.
I thought pointers have provenance, not "bytes in the value's representation".
The docs also says that the following code is sound:
fn identity(t: T) -> T {
unsafe {
let u: MaybeUninit<[u8; size_of::<T>()]> = transmute(t);
transmute(u) // OK.
}
}
This looks like what I'm doing, except that the first transmute is done on the C side.
So for my code, if I do:
pub fn does_this_work() -> MyStruct {
unsafe {
let data = foreign();
std::mem::transmute(data)
}
}
will does_this_work().ptr have provenance and be valid to be dereferenced?
The order of the MaybeUninit and array doesn’t matter. (The order can certainly matter for some types, but array elements have no padding between them.)
The means by which pointers have provenance is that the byte representation of a pointer value contains both the pointer’s address and the pointer’s provenance. In order to support this, the Rust abstract machine allows any individual byte to have provenance (since Rust’s memory is untyped; types are sort of like lenses through which you look at and operate on memory). For instance, you can disassemble a 64-bit pointer into 8 MaybeUninit<u8> values, send them somewhere else in individual pieces, and reassemble the bytes (in the correct order) back into a pointer with the original provenance.
You can think of each byte of memory being something like
Typed copies may or may not copy over the Option<Provenance> value, depending on the type, and some typed copies may even write Uninit to some destination bytes. (For instance, (u8, u16) has one padding byte, which a typed copy will not copy over; instead, the padding byte of the destination (u8, u16) value is made Uninit irrespective of the source padding byte.)
Assuming that you’re using MaybeUninit, then yes. (Also, see @RalfJung’s recent blog post about defining the semantics of FFI.)
Where is all this stuff really documented? Documentation about provenance in the standard library docs is not remotely this detailed, unless I've missed something.
I’ve pieced together information from so many places that it’s hard to keep track. std docs, Miri docs, Unsafe Code Guidelines WG, rustc compiler dev guide (even though I’ve never contributed to or even just locally built the compiler), some internal rustc_middle docs, a variety of GitHub issues and PRs…
I believe this glossary is one of the most valuable resources.
As I understand it, Ralf is the expert. So it isn’t exactly official, but his writings are the best clue as to what any final documentation will look like.
Those two types have an identical set of possible values and identical representations, in much the same way as [[T; 3]; 5] and [[T; 5]; 3] do; in practice, you use whichever one is more convenient to manipulate for your application.
but pointers are types too, and they have values, thus they have representations!
I think the confusion might be caused by the fact that provenances are NOT reified on most hardware and purely exist in the rust abstract machine, so one might get the impression that provenances aren't real, unlike, say, the metadata.
but, provenances are totally real! and they have values too!. a rust pointer can be thought of having a representation like this:
struct Pointer<T> {
/// no surprise for this
address: usize,
/// although almost always got erased in the backend, provenance is NOT a ZST!
/// because it has (runtime) information of the rust semantic model!
/// at minimal, it contains the range of the "allocation" associated with this pointer
provenance: SomeOpaqueType,
/// is this pointer "fat" or "thin"? may or may not be a ZST
metadata: <T as Pointeee>::Metadata
}
when a roundtrip operation (e.g. transmutation to and from MaybeUninit) preserves the value of any types, this includes any pointer types, of course it preserves the provenance, and metadata, as well as the address, no matter what the "bytes in the value's representation" is for the pointer type.