`slice::from_raw_parts` returns a different address in const context

I originally ran into this problem when using NonNull::dangling(), but simplified the example. Can anyone explain why X.as_ptr() doesn't return 1?

use std::slice::from_raw_parts;

const X: &[u8] = unsafe { from_raw_parts(1 as *const _, 0) };

fn main() {
    assert_eq!(
        X.as_ptr(),
        unsafe { from_raw_parts(1 as *const _, 0) }.as_ptr()
    );
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.60s
     Running `target/debug/playground`
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `0x562f0a1ac000`,
 right: `0x1`', src/main.rs:6:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

2 Likes

I think the problem here is that pointers in constant evaluation don't work the same as pointers in real code: the "fake" addresses are always translated to "real" addresses in the program memory, including the fake 0x1 pointer. Writing static X: &[u8] seems to fix this, since static variables can be address-sensitive.

I don’t quite understand why this should happen here though. There is nothing in memory that would need to be re-located from the fake compile-time addresses to static memory.

Three potentially interesting observations:

  • The fact that it’s a reference and not a pointer being put into the const seems relevant. Putting either the *const [u8] from ptr::slice_from_raw_parts(1 as *const _, 0), or the *const u8 from unsafe { from_raw_parts(1 as *const _, 0).as_ptr() } into the const makes it stay 1.
  • It can’t be that the compiler thinks there actually is something at the address 1 as *const u8; e.g. using length 1 instead of 0 correctly gives a compilation error about compile-time UB.
  • Using () instead of u8 makes the problem go away, too.

Also, I can imagine proper use-cases where this behavior could be problematic. Addresses of pointers are not completely irrelevant in Rust, after all. E.g. with string interning (or interning in general), you can efficiently implement equality-checks of interned values by pointer-comparison; and these values could be &T references. For string interning in particular, I can imagine the use-case where you special-case the empty string to be interned at address 1, and then it’s IMO a reasonable idea to be wanting to allow this interned empty string to be usable even in const contexts, e.g. for initializing a data structure. Of course, there are feasible alternative approaches for this case, e.g. use the address of a static …: [u8; 0].

Hmm, it looks like this is a particular optimization that the compiler uses to treat &str and &[u8] as immediate values.

So the &[u8] is treated as a by-value empty slice, which gets converted into a pointer into (presumably) the area where all the strings end up.

3 Likes

(Even in case this turns out to be intended behavior…) It sounds like someone should open an issue for further discussion.

I’ve searched a little and couldn’t find any existing issue so far.

Issue submitted with some additional working examples: `slice::from_raw_parts` returns a different address in const context for u8 · Issue #105536 · rust-lang/rust · GitHub

1 Like

There's nothing to discuss here. It's a bug in the compiler. This code should't be accepted in the first place.

Rust 1.64 stopped compiling similar code in case of pointers, but references weren't handled back then.

Of course UB in the unsafe code is responsibility of programmer but even then it's easy to diagnose and prevent this bug.

Although all that code plays so very close to the border of what's allowed and what's not allowed that it's hard to say which operations should work and which shouldn't work.

Integers are not supposed to be convertible in const context while it's not clear what should happen with Nullptr, e.g.

1 Like

Honestly, I don't see too close a similarity here. Isn't transmuting references to usize even technically UB at run-time, since you're supposed to use proper as casting for such conversion? OTOH, the code example in this thread is not UB at all! (Dereferencing a pointer created from a usize or from NonNull::dangling() for zero-sized data types such as &[u8; 0], or also for zero-length slices, is allowed for any non-zero and aligned address.) Furthermore, we're not trying to access any compile-time-only address of a data location the compiler chose for us, we merely want to get back out the address "1" that we put in in the first place.

That said, even though I can't see why it shouldn't compile, the solution “it shouldn't compile” still seems less weird than the current behavior, as that appears to fairly clearly violate the “costs behave as if their definition would be inlined at the use-site” principle.

1 Like

This code is UB, and the only problem here is that it compiles.

References to non-UnsafeCell types must at all times point to a valid allocation. Mere existence of such reference is itself supposed to be a proof that the pointer is valid. Therefore, creation of the reference from a dangling pointer is UB.

Raw pointers are allowed to have garbage values if they're not dereferenced. References are not, even if they're not dereferenced.

At the risk of repeating myself: you don’t need a valid allocation for a zero sized type.

See this section of the std::ptr docs (bold emphasis, mine):

Safety

Many functions in this module take raw pointers as arguments and read from or write to them. For this to be safe, these pointers must be valid. Whether a pointer is valid depends on the operation it is used for (read or write), and the extent of the memory that is accessed (i.e., how many bytes are read/written). Most functions use *mut T and *const T to access only a single value, in which case the documentation omits the size and implicitly assumes it to be size_of::<T>() bytes.

The precise rules for validity are not determined yet. The guarantees that are provided at this point are very minimal:

  • A null pointer is never valid, not even for accesses of size zero.
  • For a pointer to be valid, it is necessary, but not always sufficient, that the pointer be dereferenceable: the memory range of the given size starting at the pointer must all be within the bounds of a single allocated object. Note that in Rust, every (stack-allocated) variable is considered a separate allocated object.
  • Even for operations of size zero, the pointer must not be pointing to deallocated memory, i.e., deallocation makes pointers invalid even for zero-sized operations. However, casting any non-zero integer literal to a pointer is valid for zero-sized accesses, even if some memory happens to exist at that address and gets deallocated. This corresponds to writing your own allocator: allocating zero-sized objects is not very hard. The canonical way to obtain a pointer that is valid for zero-sized accesses is NonNull::dangling.
  • All accesses performed by functions in this module are non-atomic in the sense of atomic operations used to synchronize between threads. This means it is undefined behavior to perform two concurrent accesses to the same location from different threads unless both accesses only read from memory. Notice that this explicitly includes read_volatile and write_volatile: Volatile accesses cannot be used for inter-thread synchronization.
  • The result of casting a reference to a pointer is valid for as long as the underlying object is live and no reference (just raw pointers) is used to access the same memory.

These axioms, along with careful use of offset for pointer arithmetic, are enough to correctly implement many useful things in unsafe code. Stronger guarantees will be provided eventually, as the aliasing rules are being determined. For more information, see the book as well as the section in the reference devoted to undefined behavior.

An if the zero-sizedness of the empty slice isn’t obvious enough, dereferencing a *const [u8; 0] instead can give the same result:

const X: &[u8] = unsafe { (&*(1 as *const [u8; 0])).as_slice() };

(playground)

2 Likes

Yes, but casting any integer into pointer or reference shouldn't be allowed in const context.

So the question is which rule have higher priority: the one which allows casting integer to references for zero-sized types or the one which forbids casting integer to pointer in const context.

I think casting integer to reference should still be forbidden, but since in const context it's possible to detect UB this should either compile it and it should work or it shouldn't accept it at all.

P.S. It's funny philosophical dilemma right there: Rust managed to kick out UB from safe subset of the language but still have it in const expressions while C++ went the other way and make it impossible to compile const expressions with UB. Why Rust can not do what C++ did? Lack of specifications or something more fundamental?

That's not entirely true. invalid in std::ptr - Rust is and should be allowed; it's attempting to read or write through such pointer that shouldn't be allowed.

I'm really not convinced this is UB, since the length of the slice is zero. One can get a slice created just like this when using Vec::new().as_slice(), for example: that didn't allocate, so just uses a dangling (but correctly aligned!) pointer with a zero length.


I think this is more just hitting the general rule that the values of pointers to unrelated things are allowed to compare basically however. It's like how function pointers may or may not have the same value depending on optimization levels.

There is probably a bug here, because from_raw_parts::<u8>(nonzero_usize as *const i8, 0).as_ptr().addr() really ought to give back that original nonzero_usize value.

But in general, I'd say that the best thing to do here is stop writing code that looks at relationships between unrelated pointers.

Something like

let a: &'static [u8] = &[1];
let b: &'static [i8] = &[1];
assert_eq!(a.as_ptr(), b.as_ptr().cast());

isn't UB, but that assert might pass and might fail. The runtime semantics don't promise what relative ordering they'll have.

Or to do something that doesn't optimize today,

let a: &'static [u8] = &[1, 2];
let b: &'static [u8] = &[1];
assert_ne!(a.as_ptr(), b.as_ptr());

That passes today, but different compiler options or a future release might well learn to merge the storage for them, and thus have the assert start failing.

6 Likes