Is there any documentation specifying that the maximum size of T cannot be larger than isize::MAX in general? Or could code like this never be sound? The function std::slice::from_ref is in fact implemented exactly like this, but does not document why its safe.
Out of curiosity I tried if it was possible to compile code with a larger size:
fn main() {
let _x = [0u8; isize::MAX as usize + 1];
}
This fails with an error (here on armv6l):
error: the type `[u8; 2147483648]` is too big for the current architecture
However relying on the current implementation of the compiler, which apparently only "generally tries to ensure" that this is safe isn't as good as being able to rely on documented behavior.
Thanks for the reply. I see the problem with that.
Does the same reasoning extend to composed types as well? For example
#[repr(C)]
struct X {
start: u8,
rest: [u8; isize::MAX as usize],
}
currently fails to compile, even though there is no single array/slice with more than isize::MAX elements.
Is that because the offset from x to x.rest[isize::MAX as usize - 1] would be too large?
And what about potential future non-LLVM backends? Could they lift this restriction making code like the make_slice function unsound?
isize::MAX, which is slightly greater than 9.2E18, is larger than the maximum memory that can exist in any foreseeable computer. Therefore no object or aggregation of objects of this size can actually exist in hardware, and thus no program needs to be able to deal with it.
The open-source RISC-V architecture has a variant, RV128, with 128-bit virtual addresses. The RV128 documentation states that there is no expectation that any real hardware will ever require more than a 64-bit physical memory address. Also note that on RV128, usize = 128.
It's possible to posit a program that has a larger object existing in virtual memory, but it's hard to believe that the resulting program would not be horrendously inefficient compared to one that took into account the underlying required file access to the paging store.
How would that interact with UB? I’ve heard that UB can affect code at a distance, even if it isn’t run. For example, if for all defined behaviour a function returned true, then a condition check on its return value could be optimised out. Then, if in practice it invoked undefined behaviour, then the missing check could in theory do anything. Would it be possible to set up a case like that here, without having to ever actually allocate the exceptionally large type?
isize::MAX , which is slightly greater than 9.2E18
This is true for 64-bit architectures.
According to rust-lang/rfcs#1748usize/isize can be as small as 16-bit. In this case such a situation seems much more likely.
This case is also explicitly mentioned in the documentation of pointer::offset.
It probably won't occur very often in practice, but I think "probably" is a dangerous word, when it comes to UB (and program correctness).
That assumption will hold. As noted, llvm (and by extension Rust) requires allocations to be smaller than isize::MAX. In particular, each stack value is also treated as an 'allocation' on its own. In other words, you can only safely obtain a &T in the first place if T is smaller than isize::MAX. The compiler currently requires the types of all values to be small enough to ensure this property.
In the example above, struct X is allowed to be defined but you can not use it. In particular, the compiler rejects the code if you try to create a value of the type.
UB, as used by @RalfJung and the Unsafe Code Group, is a situation where the mandatory input requirements of the compiler backend optimizer and code generator—usually LLVM—have been violated. When that happens, LLVM (or its replacement) is no longer required to produce code that you consider correct.
Any UB in the program releases the compiler from its requirement to generate correct code. Since LLVM considers the program in its entirety, any UB can lead to misbehavior anywhere else in the program. Even if code with UB appears to work when tested, there is no guarantee that the code will still work correctly after any change in 1) the program, 2) any included dependencies, or 3) a compiler update.
In summary, UB (Undefined Behavior) means that the behavior of the compiled program is undefined; it is not required to meet the apparent intent of your (erroneous) code.
Note that that's a little aggressive to say that "UB causes unbounded behavior even if never executed".
Consider the following:
if false {
unsafe { hint::unreachable_unchecked() }
}
This is never UB, because the requirement is that unreachable_unchecked is actually unreachable.
In general, truly unreachable code can cause arbitrary behavior (even UB) so long as it is actually unreachable.
UB has to be "executed" in order to cause issues. The insidious part, though, is that UB time travels; if your program is going to exhibit UB at any point, behavior for the entire runtime is subject to being arbitrarily wrong.
You are correct that computer architectures with smaller usize::MAX can realize hardware with the maximum addressable amount of memory. Rust supports the TI MSP430, which has a 16-bit word size and a 24-bit address size. It's possible to buy that much memory—16 MB— in a single package. Nevertheless, LLVM does not support a structure larger than half the address space, which for the TI MSP430 is 8 MB.
Each of those compilers will impose the same limit on maximum object size, because they each use LLVM. There are also C and C++ compilers that use LLVM, and thus have the same constraint.
It is quite possible that code flow analysis early in the optimization process eliminates dead code that would trigger UB at a later stage of optimization if it were to remain in the optimizer's internal SSA representation of the program. That is my perhaps-incorrect personal explanation of your example. But how many people deliberately write dead code, other than as code stubs during development when they usually aren't invoking any compiler optimization?
I know for sure we do it sometimes when we're forced to. After all, that's what unreachable_unchecked/debug_unreachable! is for.
I definitely have some code that does result.unwrap_or_else(|_| debug_unreachable!()) because that was the only way to optimize out a panicking branch I had a reasonable proof was never taken.
The point of that "call for UB" function is specifically to tell the compiler it can optimize out a branch (because it's UB).
Note that "UB has to be executed to wreck your program" is distinct from "invalid values don't have to be used to wreck your program".
The recent example Ralf shared was
fn foo(b: bool, n: usize) -> usize {
let mut a = 0;
for _ in 0..n {
a += if b { 2 } else { 1 };
}
a
}
(roughly). NaĂŻvely, there's "no UB executed" if you call foo(transmute(3), 0), but that can still cause issues, because the compiler can hoist the if out of the loop.
But in actuality, you still "executed UB," even if n is 0, because the UB happened when you transmuted a 3 into a bool.
--
When you get down to LLIR level, things get thornier. poison is a "delayed UB" value. If you use it, you get UB. But if you hoist the if out of the loop, you're introducing a use of the poisoned bool?
I'll be frank here: I don't know how that is resolved. But at the Rust level, I don't need to, as there is no "UB to use but ok if you don't touch it" concept (for values). You get UB at the Rust level as soon as you read an invalid value into a typed value.
I agree with @CAD97 here. Miri does not execute dead code; dead code does not affect whether a program execution has UB. "Dead code cannot affect program behavior" is even stated as a key principle in my blog post; whether or not a program has UB is certainly part of its "behavior".
Ok, so from what I've gathered here, this is a case of Rust not having a specification, but rather relying on the current implementation of the compiler as the specification.
So currently it is as guaranteed as it can be without a formal specification of Rust.
Yes if usize on that platform is 16 bits; no if usize is greater than 16 bits, as it is on most 16-bit data-path microcontrollers and SoCs. What real hardware are you concerned about? AFAIK Rust doesn't support 8051-class devices.