Part 3: The alloc module API

arnie · January 19, 2023, 4:24pm

I'm trying to learn and understand the decision that led to the seemingly poor alloc module API.
In one of my previous post guy was mentioning monomorphization cost and giving it as a reason for alloc not being generic. I can understand that and it does make sense. But then we look at the alloc API and we have:
std::alloc::alloc(layout)
where layout is an object that is created with generic API!
So to me it seems that the issue with monomorphisation is simply moved to other place but not avoided.
Can somebody shed light into that aspect?

Michael-F-Bryan · January 19, 2023, 4:52pm

It's not necessarily about monomorphisation being expensive.

The global allocator is overridable via the #[global_allocator] attribute and can't be directly called from the standard library. The standard library is still able to use the global allocator from alloc::alloc::alloc() and friends because they depend on "something" providing functions like __rust_alloc() which will do the allocation/deallocation. The #[global_allocator] attribute expands to provide definitions for those functions in your downstream crate, and the linker wires everything up so nobody notices the inverted dependency.

github.com

rust-lang/rust/blob/79335f1ac4f6bc72795d4ac84764aa185648b5f2/library/alloc/src/alloc.rs#L22-L42


      
          extern "Rust" {
              // These are the magic symbols to call the global allocator. rustc generates
              // them to call `__rg_alloc` etc. if there is a `#[global_allocator]` attribute
              // (the code expanding that attribute macro generates those functions), or to call
              // the default implementations in std (`__rdl_alloc` etc. in `library/std/src/alloc.rs`)
              // otherwise.
              // The rustc fork of LLVM 14 and earlier also special-cases these function names to be able to optimize them
              // like `malloc`, `realloc`, and `free`, respectively.
              #[rustc_allocator]
              #[rustc_nounwind]
              fn __rust_alloc(size: usize, align: usize) -> *mut u8;
              #[rustc_deallocator]
              #[rustc_nounwind]
              fn __rust_dealloc(ptr: *mut u8, size: usize, align: usize);
              #[rustc_reallocator]
              #[rustc_nounwind]
              fn __rust_realloc(ptr: *mut u8, old_size: usize, align: usize, new_size: usize) -> *mut u8;
              #[rustc_allocator_zeroed]
              #[rustc_nounwind]
              fn __rust_alloc_zeroed(size: usize, align: usize) -> *mut u8;
          }

The key part to realise is that all extern functions must have one signature and can't contain generics because otherwise it's not posible to resolve things when linking. That means even if the GlobalAlloc trait returned a strongly-typed *mut T, you would lose that type when crossing the __rust_alloc() boundary.

That's fine. The Layout::new() constructor is just a helper that uses core::mem::size_of() and core::mem::align() to get the size and alignment of a type at compile time.

github.com

rust-lang/rust/blob/79335f1ac4f6bc72795d4ac84764aa185648b5f2/library/core/src/alloc/layout.rs#L148-L154


      
          pub const fn new<T>() -> Self {
              let (size, align) = size_align::<T>();
              // SAFETY: if the type is instantiated, rustc already ensures that its
              // layout is valid. Use the unchecked constructor to avoid inserting a
              // panicking codepath that needs to be optimized out.
              unsafe { Layout::from_size_align_unchecked(size, align) }
          }

github.com

rust-lang/rust/blob/79335f1ac4f6bc72795d4ac84764aa185648b5f2/library/core/src/alloc/layout.rs#L13-L21


      
          // While this function is used in one place and its implementation
          // could be inlined, the previous attempts to do so made rustc
          // slower:
          //
          // * https://github.com/rust-lang/rust/pull/72189
          // * https://github.com/rust-lang/rust/pull/79827
          const fn size_align<T>() -> (usize, usize) {
              (mem::size_of::<T>(), mem::align_of::<T>())
          }

As you can see from size_align()'s comments there actually have been concerns around the cost of monomorphisation, but the current way of doing things seems to be less expensive than the alternatives.

I would say this is a fairly subjective opinion.

Rust's alloc module provides a good low-level API that you can build more high-level, well-typed APIs on top of, akin to malloc() and free().

I would say this actually makes it a good API because it is very rare that an end user would be calling alloc() by hand - you almost always use higher-level collections/abstractions, and for implementors of those types it is much more convenient to work with untyped memory than strongly typed pointers.

H2CO3 · January 19, 2023, 4:52pm

First, Layout itself is not generic. Neither are several constructors (e.g. from_size_align). What you are referring to is likely Layout::new::<T>(), which is however a much more trivial function. Its monomorphization will increase the generated code size much less than instantiating the entire allocation algorithm several times.

ZiCog · January 19, 2023, 4:57pm

Seems OK to me. As an old C hand I naturally expect something that is allocating memory to return a pointer to bytes. Especially as it's called alloc.

I don't see anything generic about the Layout object. In the source it looks like this:

pub struct Layout {
    // size of the requested block of memory, measured in bytes.
    size: usize,

    // alignment of the requested block of memory, measured in bytes.
    // we ensure that this is always a power-of-two, because API's
    // like `posix_memalign` require it and it is a reasonable
    // constraint to impose on Layout constructors.
    //
    // (However, we do not analogously require `align >= sizeof(void*)`,
    //  even though that is *also* a requirement of `posix_memalign`.)
    align: ValidAlign,
}

arnie · January 19, 2023, 4:59pm

Apparently monomorphisation costs are not that important - something you were very adamant of and gave as a main reason for the alloc API to be the way it is.

arnie · January 19, 2023, 5:00pm

It is more difficult and less ergonomic to C API. In my book that is poor API.
But thanks for the answer. I appreciate the effort you've put.

H2CO3 · January 19, 2023, 5:02pm

You are again misinterpreting the answer above. It says "It's not necessarily about monomorphisation", and not that "it's not about monomorphization". Good API design is allowed to, you know, take multiple factors into account.

Once again, you are completely ignoring the fact that Rust has typed heap allocations. They are called Box, Vec, String, etc. You don't have to use the alloc/Layout API most of the time.

arnie · January 19, 2023, 5:03pm

Yes, but the only reason given by you was monomorphisation cost. You know. You didn't neither considered nor mentioned that there can be other, more important one.

arnie · January 19, 2023, 5:03pm

I don't. I am trying to learn alloc module.

H2CO3 · January 19, 2023, 5:04pm

That doesn't invalidate my answer in any way. It just means that there are also additional factors, but monomorphization still stands.

arnie · January 19, 2023, 5:11pm

Well, I really don't see it that way. My point is that std::alloc::alloc doesn't/shouldn't need argument of layout to be able to allocate memory for a type. Even if we agree on the fact that returning *mut T is no op. In my eyes, returning *mut u8 is simply a workaround. It is OK but that's all it is: a workaround.
It is simply making usage of the API unnecessary complicated (in already complicated area) without giving much benefit.

ZiCog · January 19, 2023, 5:25pm

I think I found the problem here.

std::alloc::alloc does not, and I guess is not intended to, allocate memory for a type. It allocates memory, which comes in bytes. So as a minimum it needs a length parameter and return a pointer to bytes. However we have that pesky alignment problem to take care of as well.

As noted here, for allocating a type use Box or some such. As the documentation says: "‘box’, provides the simplest form of heap allocation in Rust." Sounds just like what you want.

The C allocation, malloc, also takes a "layout". Except that layout is only a length. Alignment has to take care of itself.

arnie · January 19, 2023, 5:31pm

I believe that this is the most sensible answer. Thanks.

arnie · January 19, 2023, 5:32pm

I'm trying to learn about manual mem management in Rust. I know about Box and others. Thanks.

steffahn · January 19, 2023, 5:45pm

Minor note on terminology: I would consider Box a form of “manual memory management”, too. Really, all memory management in Rust is somewhat manual. The advantage of Box is that it's also safe to use. (Really, one of the main powers of Rust is its safe manual memory management.)^[1]

In fact, Box is quite useful in cases where you want raw-pointers, too. Using Box::into_raw(Box::new(...)) for allocation and initialization, as well as drop(Box::from_raw(...)) for dropping and deallocating can be fairly ergonomic. In cases where you don't want to initialize the memory when allocating, that could still be possible when MaybeUninit is also used, though arguably, at that point it maybe becomes a bit lengthy, I guess.

Of course, “manual” is not a technical term, so other people might prefer to think of Rusts memory management as some compile-time-determined “automatic” memory management instead. ↩︎

ZiCog · January 19, 2023, 5:48pm

Wow, thanks. That is the first time someone has hinted I might be sensible for years

arnie · January 19, 2023, 5:50pm

The credit must go where the credit is due. Cheers!

ZiCog · January 19, 2023, 5:57pm

I don't see why not. If you want memory allocated on the heap for something you use Box. If you don't use Box the thing ends up on the stack. You have to Box things manually, yourself.

I don't see that not having a matching deallocator, free, for a Box makes it any less manual. After all you have to manually take care of the scope the Box is in to determine where it is deallocated. Which you have to do in languages like C anyway, else you end up using something that is free'd or leaking something that is not.

Actually, I think shared_ptr is a kind of garbage collector. It cleans things up when it sees nobody else is using them. Just like a garbage collector does.

nbaraz · January 19, 2023, 6:14pm

Like most software, rust's allocation api is built in layers. alloc() is the lowest layer. If the api you want existed, it would use this low level alloc(). It doesn't exist for one reason: It wasn't useful enough for anyone to add it.

ndusart · January 19, 2023, 6:37pm

No one told you that alloc was used to allocate a type. Now suddenly, telling it is not meant for this is an acceptable answer for you

Reference counting is a kind of garbage collection. And you should trust Herb Sutter in that matter

As for alloc, the purpose or application of garbage collection can be wider than what you expect.

Topic		Replies	Views
Intercepting Allocations with the Global Allocator tutorials	3	423	September 29, 2023
Rust no_std find why global memory allocator is required help	4	1648	September 28, 2022
Global allocator replaceable from C help	5	324	April 1, 2024
What’s the difference between `alloc::alloc::Global` and `std::alloc::System`?	4	379	September 26, 2023
Making a custom allocator?	3	475	January 12, 2023

Part 3: The alloc module API

Related Topics