Handling memory exhaustion – State of the art?

Continuing the discussion from:

I'm curious what's the current take on that issue. I would assume allocations potentially happen in so many places (e.g. Vec::push) that it's unhandy to always have to deal with allocation errors.

Is it considered a design flaw to panic on memory exhaustion?

Some quotes from the "Panicking: Should you avoid it?" thread:

I guess running out of memory doesn't need to put a program into an "unknown state" and could be handled (with some effort).

As the effort of making all memory allocating functions return an Option or Result would be huge, I guess it's reasonable that Rust has been designed to just panic?

Yet I can imagine some cases where an action is appropriate on memory exhaustion, e.g. if your program uses garbage collection for some parts (e.g. in a virtual Lua machine that executes scripts). In that case, an out-of-memory condition could/should trigger emergency garbage collection.

2 Likes

A major reasoning behind Rust aborting on allocation failure by default[1] is that on a hosted (multitasking OS having) target, your program probably isn't going to live to see an OOM condition. Allocations which are too large to ever succeed will fail, but most modern OSes use delayed page allocation techniques[2] and memory overcommit. If your program actually uses to much memory, it doesn't result in reasonably sized allocations failing, it results in your program getting killed by the OS OOM killer.


  1. There's an unstable configuration option to use a standard panic which can unwind. Without the use of that flag, handle_alloc_error will abort the program. ↩︎

  2. TL;DR version: actual allocation work may not happen until you actually write to the allocated pointer. ↩︎

6 Likes

Yeah, but there are also other application areas, such as embedded environments with less RAM and no memory overcomittment. Doesn't Rust target these platforms as well?

2 Likes

Regarding your footnote:

I assume this is what #51245 is about? So I could run a Vec::push within catch_unwind and then invoke an emergency garbage collection?

Not that I need it at the moment, but I'm curious if there is a way to perform emergency garbage collection and allow the allocation to be attempted one more time before a panic is caused (without wrapping every Vec::push, String::from, etc. in catch_unwind calls).

It's a very subtle question. Some relevant discussions:

The TL;DR is that the problem is bloody hard. There are multiple approach, but none that is entirely satisfactory.

1 Like

Skimming through some of your links, I noticed Vec::try_reserve. But I see now how this doesn't help if a Vec is extended as part of a non-fallible trait (hence your comment on a fallible Clone, I guess).

Another thing I'm curious about: When writing unsafe code, do I need to expect Vec::push (or Clone::clone) to panic with unwind? I understand OOM is currently aborting, but can/should I rely on that or would by code be considered unsound then?

It was never designed to be used there but people have found out that it can work.

This, essentially, means that these people have to drive that design.

Maybe kernel guys would offer ideas, maybe embedded, but currently we have serious issue: lots of people from these areas don't believe in soundness thus designs they are offering don't fit into Rust's worldview and people who do believe in soundness don't work in embedded thus it's hard to them to imagine what and how can be done there.

Can you explain what phrase even means? Rust doesn't have GC, how can it perform “emergency garbage collection”?

If I understand correctly currently people try to handle that usecase and, in particular, std may, probably, work, but I doubt many crates would survive in such environment.

Precisely. That would require a lot of research and given the fact that most embedded also consider OOM condition as something unrecoverable (they would rather reset the device than try to recover from OOM… that's if they use memory allocation in principle) pool of people who have practical experience of dealing with it is extremely slim.

And as we saw time and again: pure theoretical solutions often end up woefully inadequate when people design them with actual practical use.

1 Like

A Rust program (e.g. a webserver) may have a subsystem (e.g. a scripting engine) or use a library that uses garbage collection.

Here an example:

pub mod m {
    pub struct Foo {
        vec: Vec<i32>,
    }
    impl Foo {
        pub fn new() -> Self {
            Foo { vec: Vec::new() }
        }
        pub fn modify(&mut self) {
            self.vec.push(10);
            self.vec.push(20);
        }
        pub fn sum(&self) -> i32 {
            // SAFETY: self.vec will never contain exactly one element
            // (or could that happen because of `vec.push` panicking
            // with unwind in the `modify` method?)
            unsafe {
                match self.vec.get(0) {
                    Some(x) => x + self.vec.get_unchecked(1),
                    None => 0,
                }
            }
        }
    }
}

fn main() {
    let mut foo = m::Foo::new();
    foo.modify();
    println!("{}", foo.sum());
}

(Playground)

Output:

30

Is module m to be considered sound?

Your comment is unnecessarily dismissive of other people. Plenty of designs were suggested, the problems they have are not a soundness issue. It's ergonomics, enforcing correctness, and dealing with dependencies. The simple try_reserve & friends-based API works fine in plenty of cases and is sound. But you can still easily cause an OOM when calling trait methods, such as Clone or Extend, which are used all over the place. Worse, the libraries on crates.io are unlikely to follow the same rigorous approach, and without a stable support in std there is little chance they'll try. Rust loses a lot of its value proposition if you can't use its ecosystem and have to rewrite the world.

People have suggesting entirely removing the infallible methods from collections, via some feature-like mechanism. It helps with correctness, but would also break all libraries. It's also very unergonomic. Rust is built on the assumptions that most of your methods are infallible, but with fallible allocations almost all of them will be.

It's also not true that fallible allocations are a concern only for embedded and kernel devs. Anyone who uses an arena allocator has to care about the same issues, since that arena can get exhausted. There are plenty of standard userspace applications which need to restrict memory usage in their arenas.

3 Likes

This example also looks quite artificial to me. Who wrote such code and why? What was it supposed to achieve?

C++ follows the strategy this is C++, so since you can , somebody certainly will, Rust usually starts with “practical”, “real-world” examples and then then discussion goes from there.

It's in Rust's DNA, the core idea of Rust is to handle practical problems with affine type system which definitely can not be used to solve 100% or real-world tasks (that's why there are unsafe “escape hatch” in the first place!).

A lot of unsafe code makes assumptions about the state of values in order to be sound. It particularly matters whether Vec::push (or Clone::clone or a lot of other methods) may panic with unwind when judging about the soundness of such unsafe code. This is especially an issue because I believe it will most likely not be documented in most APIs whether a function or method performs allocation or not.

The example is a minimum toy-example to explain the problem in simple terms.

P.S.: Despite whether this example is artificial or not, I'm interested in knowing whether such code is sound or not (because it helps me to deduce whether other, real-life code is sound).

Isn't soundness and enforcing correctness are two names for the exact same thing?

Ah. I see the difference but isn't this just an admission of the fact that we have no idea how to handle that thing in a sound way?

Calling try_reserve in a couple of places are not a solution if you continue to use Clone and Extend.

The current solution is separtion of no_std world and std world. This works by making allocations S.E.P..

Maybe there should be another “fallible alloc” world, but it's not clear whether creating such thing is good idea or not.

We need opinions of people who are trying to apply these things to reall tasks for real problems in real apps, otherwise it's easy to invent some kind of solution which is just makes everyone's life complicated without solving real-world problems.

Maybe these guys can present solutions they have used and what worked and haven't worked for them?

The big problem I observe is that lots of discussion is carried by people who have zero real-world experience, they just see nice problem which they can discuss to the death. And they do. Discuss it to the death, I mean.

The problem is very well-known. Scoped threads were delayed for years because of similar issue.

And, in general, std tries to assume that allocation may fail. But what happens in other crates… I don't know if any have statistic.

Good, but does that mean module m is sound or not?

It's not sound, but that doesn't mean people wouldn't write such code. You don't need to have a permit to write Rust, you know and there are no law which would give stiff penalties for people who would write unsound code.

1 Like

If this is a pervasive concern, the way to do so would likely to be to set a #[global_allocator] which has a hook for trying to free up memory before retrying calling the wrapped allocator. Then this behavior needn't infect every instance of allocation with the need to do so nor change the API of allocation at all.

3 Likes

Ah, so the solution would be to use a different allocator. Thanks, that makes sense.

I would come to the same conclusion, but is it documented somewhere that every function/method may panic with unwind in the future (unless explicitly documented otherwise)?

The docs for std::vec::Vec::push, for example, state that:

Panics

Panics if the new capacity exceeds isize::MAX bytes.

It doesn't mention that it may panic (with unwind) on memory allocation; and as pointed out earlier, Rust currently aborts on OOM.

So my question is: Is it documented somewhere that this behavior is planned to change and/or that developers must expect this to change in the future? And if yes, where is it documented?

I'm not asking because I have a particular problem right now, but I'm asking to better understand the guarantees/requirements for writing unsafe code and where to find authoritative/normative information on it.

No it's not. Soundness in Rust means a very specific thing: arbitrary use of the API can't violate memory safety and type safety. It says nothing about arbitrary logical errors, this is not Idris.

99% of current Rust code doesn't know or care about fallible allocations and just aborts on OOM. Are you claiming that all that code should be deemed unsound? Are you calling code unsound if it ever panics?

You can avoid calling Clone or Extend. You can ban them via a project-wide lint, or even make a #![no_core] project which entirely removes them. That's not the problem. The problem is that Rust is designed around those traits and assumptions, so all your dependencies will use them and all ergonomics of the language go out of the windows without them.

If you want correctness and ergonomics and avoiding the ecosystem split, then indeed we don't know how to design the APIs. It's quite likely that it would require some new powerful language features, or may even be impossible.

2 Likes