Why stable Rust supports fallible memory allocation for collections (```Vec```, ```VecDeque```) but not single objects (```Box```, ```Arc```)?

Hello everyone!

Recently, I've noticed that fallible memory allocation API is unstable for Box, Rc, Arc, but is stable for Vec, VecDeque, etc. I wonder, what is the reason for such decision?

There's different tracking issues for the different methods so it has probably just come about organically. There are still many unstable Vec methods for example (try_with_capacity, push_within_capacity...). The ones around creation (which is always the case with non-growables like Box) are probably tied up with the main allocator issue/group/RFC.

Some decision-makers are strongly opposed to going down a slippery slope of "duplicating" all methods into fallible and infallible versions.

Support for custom allocators will also need another batch of _in methods, potentially quadrupling all constructor functions.

At the same time, Rust doesn't have any feature that could avoid having duplicated methods. Many proposals for variations of optional arguments didn't lead to anywhere (except lots of bikeshedding about syntax).

8 Likes

I don't know what the real reason is, but here's why I think the decision is correct:

  1. What usually happens if your program uses too much memory is: the operating system notices it's swapping, gets angry, looks around for the misbehaving program, finds one that's 12GB big (that's you), and terminates it. There is no chance to treat this as an error within the program. It's too late; you're dead.

    By contrast, when you try to resize a Vec to a hundred million terabytes, the operating system can look at the amount of available memory and swap and just say no. Since you didn't get your memory, the system never actually gets into a low-memory state, so the OOM killer never goes hunting.

  2. If there's only 640KB in the whole box, you're probably not using alloc anyway.

  3. Allocations with Box, Arc, and Rc are typically small allocations. There is no reason to think any particular small allocation will fail, and if you're using alloc you've got small allocations all over your program, way too many to count. It's no use handling alloc failiure in just one place; and handling it in all 10 million places is impossible.

It depends on your context. In an overcommit-and-OOM context, it probably doesn't matter. But not everyone targets that context.

So the question on that subtopic probably comes down to, do you think std should support more than that particular context.

(And some of the methods, like push_within_capacity, have applications beyond "can't allocate".)

3 Likes

The number of dimensions is higher than that.

  • the smart pointer type itself
  • default vs. allocators
  • by-value init, uninit, zeroed, pinning (and potentially in-place initialization)
  • slices (specifying the lenght)
  • fallible/infallible
6 Likes

And if duplication is the only way, it is worth it. Perhaps you could add a way to group the methods in rustdoc to make it easier to get an overview. But I have real use cases for falliable allocations in custom allocators.

While I love rust, and I see zig as interesting but flawed (no memory safety, no RAII, but comptime, allocators and good compile times) I absolutely understand those who choose to go to Zig due to the poor state of custom allocators in Rust.

The whole problem started from Linux-only assumptions in the standard library. Linux with overcommit enabled is not the only platform that Rust runs on.
Even if you don't care about anything but Linux, you can use cap to make Rust programs self-impose a limit, which can be helpful to catch unexpectedly large memory usage and avoid swapping.

5 Likes

When you make a Vec, you don't have to have something to put in it, but when you make a Box with Box::new the object has to already exist, and a giant Box with a single Sized object would have to rely on compiler optimizations to not kill your stack. So this leaves boxes with unsized contents i.e. Box<[T]> (I'm not really sure there's a way to make Box<dyn Trait> without unsized coercion). These seem to be second class types in Rust. Even though Arc<Vec<T>> has more indirection than Arc<[T]> it is hard to construct the latter from a vector without copying.

There is ongoing design work on guaranteed in place init, as this is something RFL (Rust For Linux) wants in the kernel. So this question will become more relevant going forward.

I guess this is ameliorated somewhat by APIs that give you a Box<MaybeUninit<T>>, but it would be cool to be able to do this safely.

But is it a good solution, for people targeting other contexts, for the whole stdlib to support both fallible and infallible allocators?

My experience trying to treat OOM as an error, on a brilliant, fully-staffed, highly motivated C++ team with dedicated tooling and fuzzing for finding OOM bugs, was that we never got it nailed down. So adding fallible methods for everything in alloc (and everything in std that uses alloc) doesn't seem like a real solution to me, whereas the specific methods that do exist serve a purpose, in all contexts.

Do you mean that you never got rid all of assumptions that memory allocation would not fail (corresponding to things like using Vec::push() in Rust), or that you never managed to make the program acceptably recover from running out of memory, rather than corrupting state or aborting, at all sites that that could happen?

1 Like

In my experience, fallible methods in Rust work well and are helpful, even on Linux. Situation where there's really pathological overcommit and the whole machine is truly desperately out of memory is not that common. In practice you can use cgroups to make processes run out of memory before the whole machine is dying, and if the process isn't forking a lot or having lots of untouched zero pages, it can actually run out of memory when allocating it (and you can use cap to ensure it can't go haywire and allocate more than you can give it).

The C/C++ devs gave up on robust error handling, because freeing memory correctly in these languages is already hard, and freeing memory correctly on rarely taken untested error paths is even harder.

But this doesn't apply to Rust. Drop in safe code is fully automatic and reliable (leaking requires not just refcounted types, but refcounted recursive types with interior mutability, and that's a much less common case).

It is tricky for unsafe code to handle Drop and unwinding in a sound way, but unsafe code is required to do that anyway.

Most Drop implementations don't allocate, and plenty of them free memory. If you use enum-based Error types, then handling of memory allocation failures with ? safely and correctly releases memory, and is just as easy and reliable as handling of any other error in Rust.

Even panicking on OOM can work. There is unfortunately one allocation required in panics (Rust sadly exposed Box as a panic payload type), but in practice even this isn't a problem, because probability of hitting OOM is proportional to the allocation size, so in reality programs don't literally run out of very last byte of memory, but rather start failing when allocating big buffers when there are many megabytes left, and that's plenty for a panic message and even a backtrace.

2 Likes

What I meant was that the fuzzers never stopped producing test cases where we crashed; we never stopped adding new bugs (or felt we could confidently claim we'd found all the old ones).

I honestly don't feel I ever quite understood why. The codebase had very regular patterns and all you had to do was follow them. But the fact is, there wasn't a bulletproof static analysis for it and so we never got there. (We did use the C++ equivalent of #[must_use], to help when an infallible function was changed to become fallible; and generally pretty aggressively adopted whatever automated help the compiler could give us.)

Admittedly some causes are down to problems in C++ that are not as bad in Rust. We didn't have ?. We would sometimes decide to use an alternative bespoke error handling idiom in some subsystem, because check-and-return was just so painful in C++. These other idioms turned out to have their own gotchas (shockingly). Most errors were not OOM, so any code that actually handled an error had to also check for the possibility of OOM (lacking Option/enum to force us to check).

It seems to me a lot of the burden would remain in Rust. Having to write ? after every function call. Having to add an Oom variant to every error enum. Not being able to see at a glance which ?s are about application-level errors. Having to do without e.g. Iterator::filter because your closure might OOM. Having to avoid Clone and other non-fallible traits, because your impls have to be able to report OOM errors.

1 Like

Thanks, that's good to hear.

This wasn't our problem - C++ has destructors.

Aren't you just asking / entering the debate about...

If you meant, what's my personal opinion -- I do think it is worth it in some form, yes. The language and std does target system level programming, and already supports custom (global) allocators. If everyone felt the existing methods served their context adequately, they wouldn't be asking for the fallible methods. And it's okay if not everything in std serves your context.

(I doubt I'll be participating adamantly in any continuing debate, but that's my opinion.)

5 Likes

Rust already has some of this problem with "I want to make sure I can't panic", eg no indexing vectors etc., and that's been pretty painful to try to work around. Maybe there's some factor that makes it a bit easier.

Sounds like you were using error returns rather than std::bad_alloc? Yeah I can see that being really painful there, even if you basically need to to handle it sanely. In Rust land, it's much less so.

I'm curious why whatever linting tools you were using couldn't find everything reliably: it certainly seems like if you had visibility into all the codebase and sufficient annotations on the standard library - a fairly straightforward if arduous one-off task, it should be either working or not working at all. Eventually it boils down to "Don't call any of these functions, use these instead" right?

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.