Say I'm writing an OS or something where I have tight control of the memory layout, and I put frequently used data in a special heap near the bottom of the address space. I shouldn't need a full width pointer/reference to refer to this data. I also shouldn't need to use indexing, I should genuinely be able to use a smaller width value for direct loads and stores.
The obvious way to implement this is a custom pointer struct that holds an integer type (created by truncating a regular pointer) of the desired width with a Deref that casts the integer back to a regular pointer. But how to do this with proper provenance? The docs say you can use with_addr to copy provenance from some other existing pointer, but it's unclear what that would be in this case. The whole point of pointer compression is not keeping around other data. We could keep a global pointer to the beginning of the special heap just for the sake of provenance copying, but presumably each allocation out of the special heap creates a provenance with bounds specific to that allocation, not the whole heap? Because otherwise the spatial limits of provenance seem meaningless (every pointer would defacto be allowed to point at every allocation), but maybe I'm missing something here.
The issue is that provenance is not quite a defined language feature yet and its mostly related to compiler optimization and bringing to attention that on some platforms pointers carry extra information that cannot be included in the numerical representation.
As for your case remember that provenance is allocation specific. that is a pointer created with an allocation contains the provenance of that allocation. For a specific provenance it is only every valid to dereference it or convert to a rust reference so that the memory region described by the reference/that is dereferenced at most reached from to the start of the allocation to one location after the end of the allocation.
With other words, if you do not want to violate provenance and possibly incur miscompilations/incompatibilities with specific platforms, you must ensure that a used pointer can never point (even partically) outside of the allocation associated with its provenance.
In your case if you reserve a special region for this special compressed (smart-)pointer then you must allocate the whole region that these smart-pointers could ever point to when you create the base-pointer. The importance here is that you need to make the compiler and the platform happy. That is, you must make the compiler believe that you allocated that whole region of memory and not violate any platform specific OS ABIs. As long as you do that you may even just have the base-pointer be a const that is generated const-time. Or you generate it at runtime but you use some tricks to allocate without any actual backing memory till you need it. But you must allocate the whole region to create the base-pointer.
Mind you that doing this will defeat defense mechanism like on pointer-tagging on ARM i think, but I am a bit outside of my experience when it comes to these features
I think on CHERI-like systems in machine mode you should have either access to a master pointer which covers the whole address space or a way to forge pointer metadata.
What exactly is an allocation or creates an allocation for the purpose of Rust (on real existing hardware, you can't actually buy CHERI hardware so it doesn't seem very interesting)?
Obviously the global allocator in Rust is one example. As is presumably malloc and free in user space programs. And then there is statics. I have read the global allocator API is even given magic attributes when it comes to LLVM. Something that you don't get in non-global allocators.
But what if I'm implementing the global allocator in Rust?How do I create new allocation at a specific address and length? Assuming I'm writing the root allocator for the entire OS (so that you can't just say that I'm getting it from the OS).
What if my API doesn't even fit the standard global allocator API? Kernels often have page allocators that return entire pages or huge pages. How do I tell Rust about the size of such an allocation?
The strict provenance API is what I would assume handles this, but it seems underdeveloped: it is only made for user space and assuming someone else provides the allocations. There isn't anything there for building up allocations from nothing.
Another related issue is: what counts as an allocation and what doesn't? Mmio register addresses probably don't is my understanding. They are considered outside the AM according to opsem from what I remember. But what about a hardware allocated buffer in the address space, like the VGA buffer on old PCs? It would still seem useful to me if Rust and LLVM could know what size of that buffer is.
My understanding (from watching what T-opsem talks about, not studying it all myself in detail) is that the answer is: you don’t. It magically happens in between
The point of the allocation concept, as I understand it, is to justify the assumptions that
Memory returned by alloc::alloc::alloc() is exclusively usable by the caller and cannot validly be accessed by anything else.
Even though the allocator might return an address that overlaps with another previously freed allocation.
Threads’ stacks similarly are exclusively available to those threads and won't overlap with the heap.
The important thing is that these are all cases where memory is given to Rust code by “the Rust runtime”. Allocations are how the semantics handles this base case that does not involve manipulations of pointers already owned by the Rust code that is requesting an allocation. They don’t have anything to say about how the runtime gets those pointers, and so a GlobalAlloc implementation is free to do whatever to produce the pointers it returns to the runtime (as long as it doesn't itself violate the exclusivity).
Then you don’t need to worry about the AM concept of separate allocations. It’s perfectly normal to take some memory that is one allocation from the Rust perspective, and subdivide it into sub-allocations that abstract Rust doesn't particularly track. This is done all the time in ordinary programs that aren't operating systems. Allocations and provenance are an abstract description of the program used to justify how it should be optimized and executed, not something you have to rigorously keep as small as possible in all the places you possibly could.
There are probably some useful operations in this area that don’t exist, but not having them to use won’t typically make your program incorrect. And if that doesn't turn out to be the case, you should probably be talking to T-opsem / T-libs-api about how to add something that fills the gap.
I think this stance sweeps under the rug some important questions. The provenance-based optimizations still apply to the allocator's code, so it's important to clearly answer questions like "what is provenance of a poitner returned by mmap?". You could make mmap part of "the Rust runtime", but it's hardly a satisfying position. What about other potential syscalls or shared library calls returning pointers to "allocations" made by them?
In the case of CHERI the answer is clear: the kernel/library is responsible for crafting a valid pointer and hardware tracks it "provenance". But it's not clear what happens with provenance tracking during compilation. How the compiler knows that pointer which we got from mmap is valid for len bytes?
The same question applies to the machine mode in which we manually split the address space into different parts.
We have an escape hatch in the form of with_exposed_provenance, but I am not sure if it's "the" solution for the problem or just an unfortunately necessary hack.
It doesn't. Rather the compiler optimizes based on the assumption that mmap is valid for len bytes if len bytes are accessed. And if this assumption was incorrect, you have UB and anything can happen. For all extern functions the compiler will optimize based on the assumption that there is some combination of rust abstract machine operations being performed by the extern function that would make the entire program have defined behavior.
This seems to preclude allocations with holes in them? While unusual, Linux absolutely let's you set up that (or at least have varying permissions within a mmap).
But how does it know that mmap is even an allocation and not just an arbitrary pointer into some existing allocation?
It could also be an issue in a kernel/embedded context where memory can have holes.
In practice - not what the model says - compilers use more local reasoning, it does so by observing your accesses and then propagating those, or through explicit annotations in the IR.
If you read from a pointer then it's allowed to reorder the read around things that aren't optimization barriers. If you also do pointer arithmetic then it now knows that not just a single location but a whole range must be valid and also has been valid in the past up to optimization barriers. These assumptions enable hoisting, merging loads, reordering around function calls and a bunch of other optimizations.
If you turn a pointer into a reference then accesses get further annotations making additional promises, like it being non-null. And if pass that reference as a function argument then an annotation gets added that promises lifeness for the whole function body's duration.
But inlining expands the scope where optimizers can apply their local reasoning. Sometimes the whole lifetime of an allocation gets inlined into a single function and then the compiler can reason about it end-to-end, including things like eliminating allocations and doing memcpy-forwarding.
Though mmap isn't considered an allocating function, so at least the allocation-elision optimization doesn't apply to it.
And this feels like a big hole in the provenance model. I would expect to have an unsafe intrinsic which tells the compiler the valid range for an externally obtained pointer and that the resulting allocation does not overlap with any other currently existing allocations. On CHERI it would check the pointer metadata, while on other arches it would be NOP outside of compile time.
For the global alloc method this intrinsic would be applied automatically, but for manual mmap calls we would need to do it manually. I think such intrinsic could even open doors to software emulation of CHERI capabilities, something similar to what is done by Miri, but without annoying restrictions on external calls.