Does `&str` reliably have length <= `isize::MAX`?

For the purposes of avoiding UB in unsafe code, can we completely rely on byte length of a &str being <= (isize::MAX as usize)?

The docs for offset method of pointers state that "Allocated objects can never be larger than isize::MAX bytes". Docs for Layout::from_size_align also mention the same constraint.

A &str is essentially a slice, which would need to point to within an allocated object. Therefore, I think it's guaranteed that length of a &str cannot exceed isize::MAX bytes.

However, other answers on this forum (e.g. Does string literal have a maximum size? - #12 by notriddle) suggest that maybe this constraint is dependent on LLVM, and therefore only currently guaranteed. If that's the case, I'm not sure if that guarantee is strong enough to rely on in unsafe code.

Additionally, I notice that the compiler does not seem to capitalize on this constraint (if it exists) to remove unreachable code in a function like this:

fn str_concat_len(s1: &str, s2: &str) -> usize {
    s1.len().checked_add(s2.len()).unwrap_or_else(|| unreachable!())
}

That also makes me wonder if I'm being nieve in assuming this guarantee exists.


In case anyone is wondering, the practical application is constructing a String from a concatenation of multiple &strs, with a minimum of branches and bounds checks (well, more accurately, an arena-allocated bumpalo String).

It's absolutely guaranteed. No allocation can ever be more than isize::MAX bytes https://doc.rust-lang.org/std/ptr/index.html#allocated-object and a &str is to one allocation. And it's also mentioned on the APIs that create slices, like https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html#safety.

It's even a language UB rule, rather than just a library UB one, though I'm not immediately finding the reference for that right now. But that way we'll be able to put LLVM range parameter information on all functions taking references-to-slices -- which is what's needed to let it notice that that math you're doing can't overflow.

(A *const str can have a length longer than isize::MAX, however, because pointers are allowed to have silly metadata values. That's why align_of_val_raw in std::mem - Rust is unsafe.)

6 Likes

Thank you very much for clarifying that, and for the very swift reply.

Apologies if the question smelled like I hadn't bothered to read the docs. I had, but just am very conservative with unsafe code, and I'm aware there may be subtleties which are beyond my understanding. So I wanted to make completely sure my reading was correct.

1 Like

Always a good thing!

And appropriate here -- it was an active area of discussion in the past year, see things like Decide on validity for metadata of wide pointer/reference with slice tail · Issue #510 · rust-lang/unsafe-code-guidelines · GitHub

But things like what you bring up are a big part of why it's desired as guarantee. For example, (i + j)/2 in a binary search we want LLVM to know it doesn't overflow.

(Well, a binary search over slices of non-ZSTs. But there's no need to bother binary searching over ZSTs, so that's fine.)

2 Likes