Is it possible to extract repr(rust) struct's memory layout after monomorphization? (ABI)

For context I've been learning a bit about how the compiler works and about designing ABIs. The goal of this question is to resolve some of the gaps in my understanding so please expand on the why part of your answers where possible, I want to learn.

When I have been reading about a rust ABI one issue that comes up is struct reordering. My understanding is to enable better memory alignment (especially with generics) the compiler may rearrange your struct fields during compilation to ensure correct memory alignment.

This is fine and dandy for statically compiled applications but causes issues when you try to dynamically link to it. The issue comes from the fact that you can't guarantee the layout of the struct in the source code matches the layout in the compiled library.

Here comes the root cause of my confusion. Assuming all the above holds true, why can't we just export the final memory layouts at compilation and use those?

As I understand it, monomorpization is where the compiler converts all the generic types into concrete types. Therefore I assume (but was unable to confirm) that the struct reordering happens either during or immediately afterwards.

If that is the case could you not just get the compiler to spit out the concrete structs with their finalized layouts after it completes monomorphisation? Are there other factors that prevent that?

Sure, the compiler could produce that information. (If you pass -Zprint-type-sizes on nightly, it'll print all the struct layouts right now.) What do you then want to do with it?

With dynamic linking in general, the program and the loaded library must have code that agrees on what the layouts of various structs are (among other things). Right now, the only way to do that is to use the same compiler version and flags.

Just reporting what layouts the compiler picked doesn't help, because if you could use that information reliably, by having a second compiler invocation read the layout info from the first compiler invocation, then you might as well follow the "use the same compiler version" strategy.

‘Solving the Rust ABI problem for real’ would mean defining those layouts in a way which a program and a library compiled at different arbitrary times with arbitrary compiler versions can agree on layouts, neither necessarily predating the other. The user of a program with dynamic libraries should be able to upgrade either the program or the library without breaking compatibility. That means that if one of the two must pass a file to the compilation of the other, — that simply is not a solution to the complete problem.

2 Likes

You can if the struct opts into it.

Admittedly I still have a very limited understanding of the whole process so forgive me for any details I've misunderstood. In my head you the way it would've worked is by embedding some kind of diff into the library during compilation. Any structs that have been reordered (from the perspective of their source code) would have the changes recorded 'function a is now at *a + offset' then when linked the pointers would just be shoved around as required.

This would (in my limited understanding) have some minimal impact when loading a library for the first time (due to the translation) but I don't think there would be any ongoing cost (the offsets shouldn't change once calculated) and the libraries themselves might be a slightly larger to accommodate the metadata.

I'm assuming I've missed something and this isn't possible for one reason or another, hence why I thought to ask here.

It seems you're picturing that struct layout is about a set of functions. It is not. It is about where the fields, the pieces of data, are located in each instance of the struct.

Every compiled function that accesses some field of a struct or enum, or contains an inlined function that does, contains those offsets hard-coded. You'd need to have the dynamic loader rewrite all those offsets, and this would come at a cost of compiling the code in a way that they can be rewritten (for example, you lose the ability to skip computing an offset if the field is the first field). And not every layout is as simple as “has an offset that might be changed” — niches, for example, are an entirely different axis, about the values the field(s) take on.

Making all of this stuff rewritable at load time would be a significant cost to the size and performance of the compiled code, and would require features that aren't part of conventional dynamic linking, so there'd need to be some sort of Rust-specific initialization step.

You might want to watch https://www.youtube.com/watch?v=MY5kYqWeV1Q - it's a talk given at Rust Nation 2024, all about a possible path to a stable ABI for Rust.

And I'm assuming that you've read How Swift Achieved Dynamic Linking Where Rust Couldn't - Faultlore - this also talks about how you might get a stable ABI for a language with monomorphization.