`Vec::with_capacity` does not painc as the documentation says

Hi, I'm having a little trouble reading the Vec documentation.

Vec::with_capacity

The documentation of Vec::with_capacity says:

Panics if the new capacity exceeds isize::MAX bytes.

I don't know what exactly "capacity" refers to here:

let _ = Vec::<i32>::with_capacity(usize::MAX);
// thread 'main' panicked at 'capacity overflow'

let _ = Vec::<()>::with_capacity(usize::MAX);
// ok

Obviously, this capacity does not refer to the parameter n of Vec::with_capacity(n), because the latter example does not painc. So I'm guessing that capacity refers to the memory size (aka. size_of::<T>() * len), but the following example is still not panic:

let result = std::panic::catch_unwind(|| {
    let _ = Vec::<u8>::with_capacity(isize::MAX as usize + 42);
    // memory allocation of 9223372036854775849 bytes failed
    // I'm not very confident, but this doesn't seem to be a panic.
});

assert!(result.is_err());

I read the source code of Vec::with_capacity, but I didn't find any check related to this.

slice::binary_search_by

Here is the source code of slice::binary_search_by.

pub fn binary_search_by<'a, F>(&'a self, mut f: F) -> Result<usize, usize>
where
    F: FnMut(&'a T) -> Ordering,
{
    let mut size = self.len();
    let mut left = 0;
    let mut right = size;
    while left < right {
        let mid = left + size / 2;
        //        ^^^^^^^^^^^^^^^ Here it was `(left + right) / 2`
        //        But because of ZSTs, it was changed to avoid overflow.
        /* ... */
    }
    /* ... */
}

pointer::offset

Some documentation from std::pointer::offset:

The compiler and standard library generally tries to ensure allocations never reach a size where an offset is a concern. For instance, Vec and Box ensure they never allocate more than isize::MAX bytes, so vec.as_ptr().add(vec.len()) is always safe.

Most platforms fundamentally can’t even construct such an allocation. For instance, no known 64-bit platform can ever serve a request for 263 bytes due to page-table limitations or splitting the address space. However, some 32-bit and 16-bit platforms may successfully serve a request for more than isize::MAX bytes with things like Physical Address Extension. As such, memory acquired directly from allocators or memory mapped files may be too large to handle with this function.

Question

  • Vec::with_capacity says Vec::len() should no larger than isize::MAX, but doesn't check it.
  • slice::binary_search_by (and this PR) indicates that Vec::len() should no larger than isize::MAX, except for ZSTs.
  • pointer::offset says Vec::len() * size_of::<T>() should no larger than isize::MAX, but Vec::with_capacity doesn't check it.

So here is my question:

  • Does the standard library specify the maximum value of collections length (e.g. Vec::len()), and if so, what is it?
  • As mentioned above, in the documentation for Vec::with_capacity, the use of the word "capacity" is somewhat ambiguous. Do we need to fix it?
  • Do we need to add assert!(capacity <= isize::MAX as usize) for Vec::with_capacity and other collection types?
3 Likes

It is.

It refers to exactly what the documentation says: the buffer capacity in bytes. So, vec.capacity() * size_of::<T>().

No. If the collection doesn't panic, then it will have enough memory to manage the specified number of elements safely. There's no need for redundant assertions.

1 Like

I don't know much about Rust's panic, but:

  • Why is this panic not handled by std::panic::catch_unwind? The program crashes directly.
  • Why doesn't it print out the call stack, just like other painc?
let _ = Vec::<u8>::with_capacity(isize::MAX as usize + 42);
//     Finished dev [unoptimized + debuginfo] target(s) in 2.57s
//      Running `target/debug/rustplayground`
// memory allocation of 9223372036854775849 bytes failed


panic!();
//     Finished dev [unoptimized + debuginfo] target(s) in 0.27s
//      Running `target/debug/rustplayground`
// thread 'main' panicked at 'explicit panic', src/main.rs:16:5
// stack backtrace:
//    0: rust_begin_unwind
//              at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:517:5
//    1: core::panicking::panic_fmt
//              at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:100:14
//    2: core::panicking::panic
//              at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:50:5
//    3: rustplayground::main
//              at ./src/main.rs:16:5
//    4: core::ops::function::FnOnce::call_once
//              at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/ops/function.rs:227:5
// note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
// The terminal process "cargo 'run', '--package', 'rustplayground', '--bin', 'rustplayground'" terminated with exit code: 101.
2 Likes

It might directly abort the program; that's not the point of the documentation saying "panic". The point of the documentation saying it "panics" is that "don't do this, because it won't work". Whether it panics in a way that can be caught doesn't matter much, because panics aren't designed to be caught anyway. Errors that are meant to be handled are represented using Result, and you should usually not try to work around panics instead of working with results.

Fallible allocations with a proper, Result-respecting interface are still unstable, but are being worked on.

2 Likes

I agree that the documentation is incorrect here. It clearly doesn't always panic when the length is greater than isize::MAX.

11 Likes

It looks like it might be related to this RFC: 2116-alloc-me-maybe - The Rust RFC Book
and this issue: Tracking issue for oom=panic (RFC 2116) · Issue #43596 · rust-lang/rust · GitHub

It seems that specifically out of memory errors cause the program to directly abort.

1 Like

It only actually panics on 16- and 32-bit systems. On 64-bit it lets the allocation attempt go through and just assumes such a large allocation will fail. There's no explicit panic for > isize::MAX.

Here are the relevant pieces of code. There's a checked_mul in Layout::array to catch the computed size overflowing usize. And the check in question is in alloc_guard:

// We need to guarantee the following:
// * We don't ever allocate `> isize::MAX` byte-size objects.
// * We don't overflow `usize::MAX` and actually allocate too little.
//
// On 64-bit we just need to check for overflow since trying to allocate
// `> isize::MAX` bytes will surely fail. On 32-bit and 16-bit we need to add
// an extra guard for this in case we're running on a platform which can use
// all 4GB in user-space, e.g., PAE or x32.
...
if usize::BITS < 64 && alloc_size > isize::MAX as usize

I'll admit I don't understand why the usize::BITS < 64 guard is there. Why not always perform the alloc_size > isize::MAX as usize check?


5 Likes

I agree here: either the check in alloc_guard needs to be run on 64 bit targets, or the documentation needs to be changed to indicate that a panic or an oom may occur.

4 Likes

It's probably there and first because it will statically be false on 64-bit and thus statically short-circuit the test, make alloc_guard be a constant when inlined elsewhere, etc.

3 Likes

The documentation used to say there would only be a panic on 32-bit systems. That note was removed in Add liballoc doc panic detail according to RawVec by pickfire · Pull Request #73391 · rust-lang/rust · GitHub with the rationale here:

@Mark_Simulacrum wrote:

This is a bit misleading -- and the (documentation) commentary in RawVec could be updated as well. As far as I can tell, even though the code special cases platforms that are <64-bit, that's just done as a performance optimization: we expect that you cannot allocate isize::MAX sized objects on a 64-bit (or larger, I guess) platform, because they would simply be "too big." That seems like a reasonable assumption to make.

However, calling out 32-bit here seems confusing: I would prefer to instead simply say that the new capacity cannot exceed isize::MAX bytes, which I think must always be correct.

Meanwhile the special casing for 64-bit goes back to 2015 when RawVec was introduced. That's as far back as I could trace it.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.