Happy to see pointer provenance finally stabilized today. However, I didn't find the description of runtime overhead across the std doc.
When we manipulate the raw pointers in Rust code, it is most likely the cases where we care more about the performance and need to control low-level behaviors to optimize the performance even in cache-line granularity. As a result, I think it is necessary to document the runtime overhead for pointer provenance to help us understand what's going on under the hood.
Though described in the std doc:
Note that the full definition of provenance in Rust is not decided yet
We may not need precise definitions, some descriptional statements are just fine.
There are no runtime checks related to provenance for the simple reason that provenance does not exist at runtime, except on experimental platforms like CHERI.
Provenance is only a compile-time assumption, and it doesn't do anything at run time (similar to lifetimes).
It's used to improve runtime performance, because it allows the optimizer to assume you won't create terrible hacks with pointers, and this makes it possible to simplify the compiled code.
The classic example of a terrible hack that nobody wants allowed:
let a = 1;
let b = 2;
addr_of!(b).offset(-1).write(7);
// a == 7
If the variables were next to each other in memory, and every pointer to some allocated memory was okay to use, then an offset from b could be used to set the value of a! For an optimizing compiler that's awful, because the compiler can't reasonably assume that pointers to b modify only b, and it can't optimize the variables away or put them in registers without breaking such hacks.
Since for most common platforms, the provenance will be erased at runtime, is it better to document this somewhere (like std documentation, Rust reference)?
hi, i've put some work into making the std::ptr docs more accessible.
i'm curious what your programming background is, why you're interested in provenance, what you mean by "using" provenance, and what in the current documentation made you think it might have runtime overhead.
I have been writing Rust for over 5 years, and I am familiar with C, C++, assembly and other low-level programming languages. I also know a little about functional programming languages like Haskell.
I am a PhD student majoring in software security, and provenance sounds like a great idea, which I would like to advertise to my colleagues.
By "using" provenance, I mean, hmmmm, just using it Previously in my codebase there are some integer <-> pointer casts, and I think it is better to use provenance API to rewrite those.
In the current documentation:
The exact structure of provenance is not yet specified, but the permission defined by a pointer’s provenance have a spatial component, a temporal component, and a mutability component
It is natural for me to think there are some certain "checks" when using provenance APIs, since the documentation has never said it is a compile-time feature or runtime checks.
Moreover, for a careless reader like me, the documentation is somewhat "misleading". It uses a tagged pointer code snippet as an example to show us how to use pointer provenance, but in my first reading, I thought the provenance is implemented by tagged pointer (as tagged pointer is indeed a famous approach to implement provenance).
I think there’s a bit of a misconception lurking here. Provenance isn’t something you use. Provenance is a set of assumptions the compiler (and hence “the optimizer”) make, and you are obligated not to break those assumptions in unsafe code, no matter what. This was established in RFC 3559 “Rust Has Provenance”.
The “strict provenance API”, that you can now use on stable, is a set of functions which make it easier to precisely tell the compiler what you are doing, and hence
make it easier to avoid breaking the provenance assumptions because each operation documents its assumptions,
potentially allow the compiler to make more assumptions (and hence do more optimization) because you were more specific than just some_int as some_ptr_type, and
potentially allow you to do weird pointer bit fiddling tricks without making your code incompatible with CHERI-like architectures (but you can, instead, merely not use those tricks).
Using the strict provenance API does not make your code more robust against bugs, except insofar as it helps you think more carefully about how you are unsafely manipulating pointers. It does not mean opting in to provenance, because all code is subject to pointer provenance.
How familiar are you with C/C++? Because it really sounds as if you don't know that C/C++ compilers use pointers provenance for optimizations.
The big problem: while mandate to add provenance to the language standard was given 20 years ago… actual rules still don't exist.
And that, in turn, means that some 100% correct, absolutely conforming C/C++ programs can and would be miscompiled by existing C/C++ compilers.
And that, in turn, means that the only way to know if you program would work or not… is to contact compiler developers and ask if they can guarantee that your program would work or not.
You can find dozens of proposals for ISO C and ICO C++ committees about changes to the standard that would, finally, align it with what compilers actually expect and support.
But, well… after 20 years (and counting!) without actual, final, rules… some people have started losing patience.
These are “guaranteed to work” rules: when and if “final” set of rules would materialize (and, remember, C/C++ people are talking about these rules for more than 20 years with no end in sight) your program wouldn't be miscmpile if it would follow them.
Final rules may include some more permissions compared to what “strict provenance” gives you… and that may help you to handle some corner-cases that are impossible to track without program that follows “strict provenance” rules… but at least now you have something that may provide you with a chance to write program that wouldn't be miscompiled and would work in the foreseable future.
P.S. Maybe documentation should mention that, somewhere? Because I've seen a lot of people who have no idea that C/C++ have provenance even if it's not mentioned anywhere in the standard and are blissfully unaware of the fact that if you write program that is 100% correct according to C or C++ standard then it, by itself, doesn't guarantee that it would be possible to compile and run it! And without this critical fact the whole story with “strict provenance” kinda becomes easy to misunderstand.
It's worth noting, while normally compiled programs will not have any checks for provenance, that does not mean it is checked statically.
Instead, the only way to check provenance is dynamically, through special execution environments, such as MIRI and CHERI. Within these environments, every pointer access is checked, since every pointer has provenance.
Additionally, following the rules of strict provenance allow optimizers to make more assumptions about what your code will and won't do.
This is a rather unusual situation, Rust is quite ahead of the curve here, implementing software support for things that are largely not implemented in hardware.
I agree with the spirit of your message, but just to double-check my understanding: there is actually nothing that guarantees a is stored in memory just before b, and in any case the write is undefined behavior. Therefore there's actually nothing preventing an optimizing compiler from keeping a in a register and/or replace its use with the constant 1. Interestingly, rust currently has indeed the value 7 for a in debug mode, but 1 in release mode.
It's undefined behavior, because provenance rules make it so.
The hacky address of b-1 may be numerically exactly equal to the address of a, but getting this address through b is illegal, and getting it through a is fine. This is provenance.
Rust also inherited C's rule that pointers to one element past the end of an object (allocation, array) are legal to exist, but not legal to read or write. This is another paradox of provenance:
let mut b = [0; 1];
let mut a = 0;
let b_ptr = b.as_mut_ptr().wrapping_add(1);
let a_ptr = std::ptr::addr_of_mut!(a);
// a_ptr and b_ptr are the same memory address (in naive implementations)
assert!(std::ptr::eq(a_ptr, b_ptr));
// a_ptr is legal to read/write
// b_ptr is not legal to read/write
Yeah – and that's exactly where C/C++ compilers couldn't come into agreement with the standard.
Because it was always legal to have different pointers that point to the same piece of memory (that has nothing to do with provenance, enough to recall that linear address for far pointers in MS-DOS compilers was compiled as x * 16 + y and thus there are were thousands valid combinarions for each valid address), but the opposite was, traditionally, not allowed by the C/C++ standard: if pointers are equal then they have to behave equally… which precisely what optimizing compilers couldn't guarantee.
In an attempt to half-add pointer provenance into the language recent versions added special rule that makes result of comparison of pointer-past-the-end-of-array and pointer-that-points-to-object unspecified… but they left the definition of memcmp untouched. And this, of course, is also something that optimizing compilers like to optimize away so that's where we are stuck for now: C and C++ compilers “violently believe” in the provenance, yet standard doesn't tell anything about provenance at all, valid programs (according to C and C++ standards) that may be miscompiled if they violate “unwritten rules of provenance” still exist and attempts to bring provenance into the standard are still ongoing.
There's a fun issue in LLVM. It assumes that pointers to different variables are always different, so when it knows provenance ptr_eq(&a, &b) compiles to false without checking. But then the optimizer can avoid redundant copies, unify variables with the same values, or recycle space on the stack after some variables go out of scope, so at run time addresses of two different variables can be equal.