fn main() {
let buffer: [u8; 1] = [b'a'];
let empty_slice: &[u8] = &buffer[0..0];
let empty_slice2: &[u8] = &buffer[1..1];
println!("same ptr: {}", empty_slice.as_ptr() == empty_slice2.as_ptr())
}
Can Rust compiler optimize this such that it prints true? I.e. can the compiler assume that all pointer parts of empty slices are equivalent and can be replaced with each other? Or in other way, one cannot assume that the pointer part of the empty slice is preserved, can one?
then it's entirely possible for x[..] and y[..0] to have the same pointer or for x[..] and y[1..] to have the same pointer despite being "different" locals, because you have no guarantee about addresses of locals.
Similarly,
let x: Box<[i32]> = Box::new([3; 0]);
let y: Box<[i32]> = Box::new([7; 0]);
are likely to have the same addresses because "different" zero-sized allocations might have the same address. (They might even be "inside" other allocations!)
Note that many things won't guarantee particular pointers when they give out ZSTs, though. For example, https://doc.rust-lang.org/std/option/enum.Option.html#method.as_slice might give a dangling empty slice when used on None (though IIRC for optimization reasons it typically doesn't, but there's no guarantee).
The issue came up with the code that needed to mark temporary in Vec<&str> some elements with couple of placeholders that would be overwritten later with the proper value. Using Vec<StrOrPlaceholder> does not work as that requires to copy the result into Vec<&str> and the code is performance critical.
Then I realized that I could use empty slices with different point parts as placeholders and use as_ptr() to distinguish those. But that only works if the compiler would not substitute one for another.
Compiler optimizations shouldn't be a concern, since they aren't allowed to change the semantically observable behavior of the program. It's just a matter of whether the std methods you use preserve the pointer addresses.
Also, what addresses would you choose for the placeholder values? It'd be bad if you misinterpreted a genuine empty string as a placeholder, right? There's (theoretically) only one niche in the address field which could never be the address of a str, namely null, even if there are effectively others in practice.
Instead, if you're willing to use unsafe, you could store a (*mut [u8], PhantomData<&'a str>) which contains either an actual &'a str or a placeholder. It is impossible for a Rust &str to have a length greater than isize::MAX, meaning that the most-significant bit of the length field is always zero for strings; you could encode placeholders by setting the length of the *mut [u8] to e.g. isize::MAX as usize + 1, isize::MAX as usize + 2, and so on. Then, when reading that field, you'd check whether it's a placeholder or an actual string, and use slice::from_raw_parts and then str::from_utf8_unchecked in the latter case.
The semantic was not clear to decide the meaning of the code. As @scottmcm pointed out the the empty slices can be legally used to get the offset from the underlining slice and the compiler is not allowed to treat them the same. So the slices cannot be used interchangeably by the compiler.
In my case the empty string is allowed and is distinct from the placeholders. Since the code has no control over the pointer part of the legal empty string, the code can use 3 empty slices with different pointers to mark 2 placeholders and empty_string. All empty strings will be normalized into the latter to make sure that the placeholder pointers are always different from the slice pointer of the empty string.
There's a method that relies on the pointer value:
The standard library has a bunch of shortcuts and special cases for zero-sized objects, so you can't rely on pointers to ZSTs (since elements of a slice can't even have their own addresses by design), but for regular non-zero types the implementation is very straightforward.
Ironically, it would easily optimize out your comparison if you compared "whole" slices instead of pointers, since the 0 length is easily trackable and short-circuits rest of the comparison.