Should the stdlib docs mention lazy allocation behaviors?

Hey everyone,

I ran into something today that made me question whether the stdlib documentation around memory allocation is complete enough. Consider this code:

use std::time::Duration;

fn main() {
    let mut vec = vec![0u64; 1_000_000_000]; // Lazy allocation on Linux
    for i in 0..vec.len() {
        vec.push(i as u64); // Forces actual memory allocation for new elements
    }
    std::thread::sleep(Duration::from_secs(10)); // Just wait a bit to monitor memory usage
    vec.sort_unstable(); // Can trigger OOM despite "does not allocate"
}

The docs for sort_unstable clearly state that it "does not allocate". Technically true - the method itself doesn’t allocate something new. But on Linux systems using malloc, zeroed pages don't get physically allocated as long as they only contain zeroes. So sort_unstable can still cause an out-of-memory kill if the vec has large sections of zeros.

Here's what happens: The initial vec![0u64; 1_000_000_000] creates a vector filled with zeros (using vec::from_elem(0u64, 1_000_000_000), but malloc's lazy allocation behavior means it doesn't actually allocate physical memory for pages that only contain zero values. When sort_unstable runs and starts rearranging elements, it forces those lazily allocated pages to become "real" allocated memory, potentially causing the process to hit memory limits.

I understand this is more of an OS/libc behavior than a Rust issue, but should we consider adding a note about this in the stdlib docs? New Rust developers might expect "does not allocate" to mean "cannot cause OOM", which isn't always true when lazy allocation is involved.

The same issue could affect other methods that perform intensive memory access on collections with large zero-filled regions. It feels like there's a gap between the technical accuracy of "does not allocate" and the practical reality of memory pressure.

What do you think? Is this worth documenting, or should users just be expected to understand malloc's lazy allocation behavior?

This is mostly how I feel about it -- Rust doesn't have a say. I could see it being mentioned in passing somewhere, though I'm not sure where. alloc maybe, or some language documentation separate from the library docs.

I could also see it not being mentioned. The Rust docs can't cover every topic related to running a program, though opinions can differ on where the boundary should lay. (This issue, kill -9, loss of power, ...)

As a point of comparison: Rust does have a say in optimizations that remove allocations -- that is, calls to the allocator. It's mentioned here.

3 Likes

When sort_unstable runs and starts rearranging elements

but the zero-prefix is already in the right place, does sort_unstable even touch these entries?

I don’t think this should particularly be documented in sort_unstable(), because the described behavior can happen anywhere, not just in sort_unstable(). It’s just an accident of the chosen size that it happened in sort_unstable() and not in the push(). It could even happen in a primitive assignment operation if you assign on top of existing zero elements instead of push().

1 Like

I totally agree that this is not specific to sort_unstable(). Any write to a lazy allocated page will result in a “real” allocation even such things as just writing 0’s to it. For some reason sort_unstable() just touched all the pages at once. And I am also totally fine if there is no good place to document this in the stdlib itself.

This was also my initial thought. But the quicksort implementation still swaps elements around, which also counts as writes and thus results in a real allocation.

I posted this more to document it in case others stumble upon it and to discuss whether such documentation is needed. Thinking about it, maybe it would fit well in the: Allocating - The Rustonomicon chapter ?

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.