Maximum slice length - Is this sound?

LegionMammal978 · March 18, 2023, 3:33am

As it happens, I found an near-instance of this a while back in semver v1.0.14. It could accept an Identifier of length up to isize::MAX as usize bytes, and create an allocation of up to isize::MAX as usize + 5 bytes. This was sound at the time, since the only operations it used spanning the whole allocation were alloc(), dealloc(), and copy_nonoverlapping(); it neither created pointer offsets nor slices longer than isize::MAX as usize. But this wasn't a real instance, since you'd need objects of size isize::MAX as usize and isize::MAX as usize + 5 bytes to exist in the address space simultaneously to construct the Identifier, which is impossible. And then it became unsound when Layout's preconditions were modified.

jbe · March 18, 2023, 8:05am

I think I understand now how/why the definition of "allocation" may matter. If we say that it's forbidden that an "allocation" is greater than isize::MAX bytes^[1] or, at least, forbidden that memory of such an allocation gets dereferenced in Rust, then it's clear that this line is unsound:

But I believe such a rule would be problematic. If we use any non-Rust library, how can we assure that none of its API functions return a pointer to such an object? Consider some library on a 16-bit microcontroller which allocates 40k contiguous bytes (in C), and some (various) other functions return pointers to that memory, referring to small chunks within that big allocation.

Would that mean that an allocation has sub-"allocations" that are safe to use? (This doesn't really make sense, I think.) Or would it be forbidden to dereference pointers returned by such a library, even if the objects the library refers to are smaller than isize::MAX (but within the huge 40k allocation)? We could not know what the library internally does, and I don't think many C API's will give a guarantee like: "This function is always guaranteed to never return pointers that point within memory of an allocation larger than ssize_t." It would also not really matter in C, because what matters is the size of the "smaller" objects, not where or how the underlying allocation is managed.

So this brings me to the following:

I feel like "allocation" can be defined in two ways:

as heap allocation, parts of the stack frame, etc.
as some (more abstract?) concept which is specific to Rust (and does not apply to allocations in C, for example)

I believe that the latter definition is the intended one. But if that's the case, it can't make the line

let c: &[u8] = unsafe { &*b };

unsound because the "allocation" behind b is not a "Rust allocation".

I believe that a way out would be to document that a wide-pointer is simply not valid if its length multiplied with size_of::<T>() exceeds isize::MAX. It could still be used in other ways than dereferencing, e.g. passed to FFI (just like NULL pointers, dangling pointers, or unaligned pointers if some C API allows passing such pointers).

However, I'm not really understanding/overlooking the terminology and underlying concepts/assumptions/plans very well. I hope this post didn't cause more confusion because I might have understood some concepts or terminology in a wrong way.

which isn't really possible to demand for non-Rust parts of a program, I believe ↩︎

jbe · March 18, 2023, 8:46am

I would like to approach this once more with the above example, which I slightly refined and documented further:

#![feature(ptr_metadata)]
#![allow(unused)]

fn foo<T>(x: &[T]) -> &[T] {
    // SAFETY: `x.len() * size_of::<T>() <= isize::MAX`
    // because the (non-normative) Rust reference says
    // that no object is larger than `isize::MAX`
    unsafe {
        std::slice::from_raw_parts(x.as_ptr(), x.len())
    }
}

/// Allocates `size` bytes and returns a pointer to the allocated memory
/// or a NULL pointer if allocation failed
fn allocate_huge(size: usize) -> *const std::ffi::c_void {
    // SAFETY: there are no requirements for the arguments passed to `calloc`
    unsafe {
        libc::calloc(size, 1)
    }
}

fn main() {
    // No `unsafe` block from here…
    let a: *const () = allocate_huge(isize::MAX as usize + 1).cast();
    assert!(!a.is_null()); // we require allocation was successful
    let b: *const [u8] = std::ptr::from_raw_parts(a, isize::MAX as usize + 1);
    // …to here. Due to the length requirement of `std::slice::from_raw_parts`
    // which gets called in function `foo`, one of the following must be true:
    //
    // * The `unsafe` block in `foo` is unsound.
    // * The `unsafe` block in `allocate_huge` is unsound.
    // * `libc::calloc` always returns a NULL pointer if `size > isize::MAX`.
    // * The unstable`std::ptr::from_raw_parts` is unsound.
    // * Something else in `std` is unsound.
    // * The `unsafe` block below is unsound.
    let c: &[u8] = unsafe { &*b };
    let d = foo(c);
}

(Playground)

I think it may help to answer which of the following is true:

The unsafe block in foo is unsound.
The unsafe block in allocate_huge is unsound.
libc::calloc always returns a NULL pointer if size > isize::MAX.
The unstable std::ptr::from_raw_parts is unsound.
Something else in std is unsound.
The unsafe block in main is unsound.

Unless I made an error in my reasoning, one of these six statements must be true, as the program can exhibit UB (due to what's documented in std::slice::from_raw_parts).

I believe that it's 6, i.e. the unsafe block in main must be unsound. If that's the case, it may help to try to find the exact rule which makes this unsound. (edit: preferably one that is normative)

Though maybe it could also be number 2: Perhaps invoking calloc with a size > isize::MAX is unsound as it creates an "allocation" that's too big? But if that was the case, how can we know that some C library doesn't do this behind the scenes?

quinedot · March 18, 2023, 9:16am

Where did the wide raw pointer come from? Slice pointers aren't FFI safe. So you got the ingredients (address and size) separately, and it's up to you to check that size.

Or at a higher level, I think "no references (& vs. raw pointers) to massive objects" is sufficient. If a constraint of references is "the thing behind them has size <= isize::MAX", then it's up to you to to check the size before doing that operation. (It's already up to you to check for NULL and alignment for example.)

It wouldn't need to be "no allocations > isize::MAX" technically, it's just on your head to use unsafe functions like offset correctly with such massive allocations/objects, and to otherwise never create any non-unsafe-requiring handle too them (such as a slice) as those may assume the size limit.

I don't feel like tracking it down again (sorry), but somewhere in graph of linked issues from what I've posted before you can find general descriptions of how one deals with these, things like you calculate addresses or offsets in multiple steps which fit within an isize or isize-equivalent. You have to do this in C (practically due to compiler non-support, if not as-per-the-standard) too. ^[1] There was other exotic stuff mentioned too, like at one point compilers were suppose to be able to handle 33-bit offsets or some-such, but no one ever did. (Sorry for the vagueness, it wasn't the focus of my skimming.)

Which is to say, it's fascinating (and horrifying) stuff. But from a practical standpoint extremely niche, and even when standards committees said "you should support this case", no one really ever did. So if you're in the situation of needing such support, you're already hiking through Mordor (but at least accustomed to doing so, probably). ^[2]

Making raw wide pointers adhere to the size limit is a breaking change because you can construct them in safe code today, just like you can cast an unaligned integer to a thin raw pointer (like @scottmcm pointed out (and see also below)).

If you meant references (&[u8]), that does seem to be the case and what from_raw_parts is trying to document.

I wrote all that before your comment 43. I would say the unsafe in main is unsound, you have to check the null-ness, alignment, and also size.

You don't need nightly to get such a raw slice pointer by the way, this is safe today and the metadata (slice length) remains usize::MAX.

    let slice = &[(); usize::MAX];
    let ptr = slice as *const [()] as *const [u8];

Or tackling them all,

foo is fine, your call can't be invalid unless UB already happened, not foo's fault
allocate_huge is fine, Rust can't dictate what what foreign languages do
Or in more detail...
1. If we're talking about the actual C function, it's outside Rust's purview
2. If we're talking about the Rust libc crate, it should pass on pointers to massively sized structs if the C function can create them, IMO. That is, I think there's no reason to restrict it to respect the size boundary if the C function doesn't. The main argument for doing so would be "it's unreasonable for utilizers of libc::calloc to make the check", similar to the argument for the Layout change. But if you're calling this function you're already bypassing Rust's allocation framework, so...
3. Even if you buy that libc::calloc shouldn't return massively sized raw pointers, it wouldn't be unsound to do so as per the next bullet point
It's fine to construct such raw pointers as per the already-stable example above
I don't think so but this point feels too vague anyway, did you have something specific in mind?
I think it's this one that's unsound

Technically at least, I don't know what the odds are of "getting away with" doing it the wrong way are (given how common it is to do technically UB things in that world, but rely on the compiler or compiler flags to save you). ↩︎
I think the same is true of practically everything else when you're on a 16 bit system, really. Even nominally pointer-size-portable libraries programs will fall apart when faced with tight enough constraints, unless specifically designed for such. ↩︎

jbe · March 18, 2023, 9:25am

With "validity" I didn't mean that constructing an invalid pointer is UB. Compare std::ptr::null which is safe, sound, yet creates a pointer that is not valid:

A null pointer is never valid, not even for accesses of size zero.

Maybe the problem here is that there are two definitions of "valid". null() is a "valid" value of type *const T, but not a "valid" pointer (see quote above: "a null pointer is never valid").

quinedot · March 18, 2023, 9:34am

Oh I see, yeah, terminology problem.

Yes, I'm saying you have to check the size before going from a *const [T] to a &[T], etc.; a size that exceeds the limit is invalid (to convert to a reference).

(But the *const [T] is well-formed / not UB to exist in the first place.)

jbe · March 18, 2023, 10:06am

Yes, that's what I meant too. So my proposal was that this should be added explicitly to the "validity" rules for pointers (not values).

quinedot · March 18, 2023, 10:10am

About to sign off, but I'd have to read that section closely and ponder what exactly it meant. It might be too strict, ala @LegionMammal978's last comment but more constrained. Why shouldn't you be able to read through a raw pointer to something massive, if the read itself doesn't wrap or require offsets, etc? While inherently unsafe and niche, it'd be a shame if the only resort without leaving Rust was asm.

LegionMammal978 · March 18, 2023, 3:48pm

Alas, this is not always the case, even though it ought to be for a general-purpose malloc(). I have a list of 32-bit platform malloc()s on that same issue: Emscripten, wasm32-unknown-unknown, WASI, uClibc-ng, NetBSD, OpenBSD, and possibly macOS do not enforce that the size is within PTRDIFF_MAX (the C version of isize::MAX).

scottmcm · March 18, 2023, 5:25pm

Note that part of the confusion might be that there's two meanings of "valid" here. In the sense of https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html, ptr::null() constructs a pointer that obeys both the validity and safety invariants, just one that's not sound to dereference.

system · June 16, 2023, 5:25pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Maximum possible size of `T` help	28	1777	October 20, 2020
Max value slice	4	1237	September 8, 2021
Creating fixed size slice from pointer passed from C help	10	1202	March 13, 2021
Stably get length of raw fat pointer? help	6	290	December 4, 2023
Am I triggering undefined behavior here?	17	2076	August 22, 2019

Maximum slice length - Is this sound?

Related Topics