Running the function itself doesn't encounter any unexpected behavior, but Miri says there is UB. Where is it and what's the correct way? Thanks.
fn set_null<T: ?Sized>(ptr: *mut T) -> *mut T {
unsafe { ptr.byte_sub(ptr as *const () as usize) }
}
error: Undefined Behavior: out-of-bounds pointer arithmetic: expected a pointer to the end of 147684 bytes of memory, but got alloc761 which is at the beginning of the allocation
--> src/main.rs:7:14
|
7 | unsafe { ptr.byte_sub(ptr as *const () as usize) }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ out-of-bounds pointer arithmetic: expected a pointer to the end of 147684 bytes of memory, but got alloc761 which is at the beginning of the allocation
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
help: alloc761 was allocated here:
--> src/main.rs:2:9
|
2 | let mut a = 10;
| ^^^^^
= note: BACKTRACE (of the first span):
= note: inside `set_null::<i32>` at src/main.rs:7:14: 7:53
note: inside `main`
--> src/main.rs:3:5
|
3 | set_null(&mut a);
| ^^^^^^^^^^^^^^^^
note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace
luca3s
November 9, 2024, 7:16am
2
here in the docs it says that the resulting pointer has to stay in bounds of the object which your use doesn't. the docs also suggest using wrapping_sub as that doesn't have those requirements.
pointers in rust are not just memory addresses, they also have additional information attached to them called "provenance".
I suggest you read the following materials for more details.
opened 03:32AM - 23 Mar 22 UTC
closed 01:05PM - 22 Oct 24 UTC
T-libs-api
C-tracking-issue
A-strict-provenance
<!--
Thank you for creating a tracking issue!
Tracking issues are for tracki… ng a feature from implementation to stabilization.
Make sure to include the relevant RFC for the feature if it has one.
If the new feature is small, it may be fine to skip the RFC process. In that
case, you can use use `issue = "none"` in your initial implementation PR. The
reviewer will ask you to open a tracking issue if they agree your feature can be
added without an RFC.
-->
Feature gate: `#![feature(strict_provenance)]`
[read the docs](https://doc.rust-lang.org/nightly/std/ptr/index.html#strict-provenance)
[get the stable polyfill](https://crates.io/crates/sptr)
[subtasks](https://github.com/rust-lang/rust/labels/A-strict-provenance)
This is a tracking issue for the `strict_provenance` feature. This is a standard library feature that governs the following APIs:
* [`pointer::addr`](https://doc.rust-lang.org/nightly/core/primitive.pointer.html#method.addr)
* [`pointer::with_addr`](https://doc.rust-lang.org/nightly/core/primitive.pointer.html#method.with_addr)
* [`pointer::map_addr`](https://doc.rust-lang.org/nightly/core/primitive.pointer.html#method.map_addr)
* [`core::ptr::invalid`](https://doc.rust-lang.org/nightly/core/ptr/fn.invalid.html)
* [`core::ptr::invalid_mut`](https://doc.rust-lang.org/nightly/core/ptr/fn.invalid.html)
> **IMPORTANT:** This is purely a set of library APIs to make your code more clear/reliable, so that we can better understand what Rust code is *actually* trying to do and what it *actually* needs help with. It is overwhelmingly framed as *a memory model* because we are doing a bit of Roleplay here. We are roleplaying that this is a real memory model and seeing what code doesn't conform to it already. Then we are seeing how trivial it is to make that code "conform".
>
> This cannot and will not "break your code" because the lang and compiler teams are wholy uninvolved with this. Your code cannot be "run under strict provenance" because there isn't a compiler flag for "enabling" it. Although it would be nice to have a lint to make it easier to quickly migrate code that wants to play along.
This is an unofficial experiment to see How Bad it would be if Rust had extremely strict pointer provenance rules that require you to *always* dynamically preserve provenance information. Which is to say if you ever want to treat something as a Real Pointer that can be Offset and Dereferenced, **there must be an unbroken chain of custody from that pointer to the original allocation you are trying to access _using only pointer->pointer operations_**. If at any point you turn a pointer into an integer, that integer cannot be turned back into a pointer. This includes `usize as ptr`, `transmute`, type punning with raw pointer reads/writes, whatever. Just assume the memory "knows" it contains a pointer and that writing to it as a non-pointer makes it forget (because this is quite literally true on CHERI and miri, which are *immediate* beneficiaries of doing this).
A secondary goal of this project is to try to disambiguate the many meanings of `ptr as usize`, in the hopes that it might make it plausible/tolerable to **allow `usize` to be redefined to be an *address*-sized integer instead of a *pointer*-sized integer**. This would allow for Rust to more natively support platforms where `sizeof(size_t) < sizeof(intptr_t)`, and effectively redefine `usize` from `intptr_t` to `size_t`/`ptrdiff_t`/`ptraddr_t` (it would still generally conflate *those* concepts, absent a motivation to do otherwise). To the best of my knowledge this would not have a practical effect on any currently supported platforms, and just allow for more platforms to be supported (certainly true for our tier 1 platforms).
A tertiary goal of this project is to more clearly answer the question "hey **what's the deal with Rust on architectures that are pretty harvard-y like AVR and WASM** (platforms which treat function pointers and data pointers non-uniformly)". There is... *weirdness* in the language because it's difficult to talk about "some" function pointer generically/opaquely and that encourages you to turn them into data pointers and then maybe that does Wrong Things.
The mission statement of this experiment is: **assume it will and _must_ work, try to make code conform to it, smash face-first into really nasty problems that need special consideration, and try to actually figure out how to handle those situations.** We *want* the evil shit you do with pointers to work [but the current situation leads to incredibly broken results](https://www.ralfj.de/blog/2020/12/14/provenance.html), so *something* has to give.
<!--
Include a short description of the feature.
-->
### Public API
<!--
For most library features, it'd be useful to include a summarized version of the public API.
(E.g. just the public function signatures without their doc comments or implementation.)
-->
This design is roughly based on the article [Rust's Unsafe Pointer Types Need An Overhaul](https://gankra.github.io/blah/fix-rust-pointers/#distinguish-pointers-and-addresses), which is itself based on the APIs that CHERI exposes for dynamically maintaining provenance information even under Fun Bit Tricks.
The core piece that makes this at all plausible is `pointer::with_addr(self, usize) -> Self` which dynamically re-establishes the provenance chain of custody. Everything else introduced is sugar or alternatives to `as` casts that better express intent.
More APIs may be introduced as we explore the feature space.
```rust
// core::ptr
pub fn invalid<T>(addr: usize) -> *const T;
pub fn invalid_mut<T>(addr: usize) -> *mut T;
// core::pointer
pub fn addr(self) -> usize;
pub fn with_addr(self, addr: usize) -> Self;
pub fn map_addr(self, f: impl FnOnce(usize) -> usize) -> Self;
```
### Steps / History
<!--
For larger features, more steps might be involved.
If the feature is changed later, please add those PRs here as well.
-->
- [x] Implementation: #95241
- [ ] Final comment period (FCP)
- [ ] Stabilization PR
<!--
Once the feature has gone through a few release cycles and there are no
unresolved questions left, the feature might be ready for stabilization.
If this feature didn't go through the RFC process, a final comment period
(FCP) is always needed before stabilization. This works as follows:
A library API team member can kick off the stabilization process, at which point
the rfcbot will ask all the team members to verify they agree with
stabilization. Once enough members agree and there are no concerns, the final
comment period begins: this issue will be marked as such and will be listed
in the next This Week in Rust newsletter. If no blocking concerns are raised in
that period of 10 days, a stabilzation PR can be opened by anyone.
-->
### Unresolved Questions
<!--
Include any open questions that need to be answered before the feature can be
stabilised. If multiple (unrelated) big questions come up, it can be a good idea
to open a separate issue for each, to make it easier to keep track of the
discussions.
It's useful to link any relevant discussions and conclusions (whether on GitHub,
Zulip, or the internals forum) here.
-->
- How Bad Is This?
- How Good Is This?
- What's Problematic (And Should Work)?
- [ ] Hardcoded MMIO address stuff
- We should define a platform-specific way to do this, possibly requiring that you only use `volatile` access
- [ ] Opaque Function Pointers - architectures like AVR and WASM treat function pointers special, they're normal pointers.
- We should really define a `#[repr(transparent)] OpaqueFnPtr(fn() -> ())` type in std, need a way to talk about e.g. dlopen.
- [ ] libc interop for bad APIs that pun integers and pointers
- Use a union to make the pun explicit?
- [ ] passing shared pointers over IPC?
- At worst you can rederive from your SHMEM?
- [ ] downcasting to subclasses?
- Would be nice if you could create a reference *without* shrinking its provenance to allow for ergonomic references to a baseclass that can be (unsafely) cast to a reference to a subclass.
- [ ] memcpy operations conceptually say "all this memory is just u8's" which would trash provenance
- it's pretty standard to carve out exceptions for memcpy, but it would be good to know if this can be done more rigorously
with something like [llvm's proposed byte type](https://gist.github.com/georgemitenkov/3def898b8845c2cc161bd216cbbdb81f)
- [ ] AtomicPtr - AtomicPtr has a very limited API, so lots of people use AtomicUsize to do the equivalent of wrapping_add
- Morally this is fine, unclear if the right compiler intrinsics exist to express this without "dropping" provenance.
- What's Problematic (And Might Be Impossible)?
- [ ] High-bit Tagging - rustc::ty does this because it makes common addressing modes Free Untagging Realestate
- Technically this is "fine" but CHERI might get upset about it, needs investigation.
- [ ] Pointer Compression - V8 and JVM like compressing pointers, involving massive truncations.
- Can a Sufficiently Smart Union handle this?
- [ ] Unrestricted XOR-list - XORing pointers to make an even more jacked up linked list
- You must allocate all your nodes in a Vec/Arena to be able to reconstitute ptrs. At that point, use indices.
- APIs We Want To Add/Change?
- A lot of uses of .addr() are for alignment checks, `.is_aligned()`, `.is_aligned_to(usize)`?
- An API to make ZST alloc forging explicit, `exists_zst(usize)`?
- `.addr()` should arguably work on a DST, if you use `.addr()` you are ostensibly saying "I know this doesn't roundtrip"
- Explicit conveniences for low-bit tagging? `.with_tag(TAG)`?
- `expose_addr`/`from_exposed_addr` are slightly unfortunate names since it's not the *address* that gets exposed, it's the *provenance*. What would be better names? Please discuss [on Zulip](https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Strict.20provenance.20naming.20bikeshed).
- It is somewhat unfortunate that `addr` is the short and easy name for the operation that programmers likely expect less. (Many will expect `expose_addr` semantics.) Maybe it should have a different name. But which name?
1 Like
Thanks, I'll read them later.
2 Likes