Obtain type layout information at compile time

What I'm trying to do

I am working on improving the performance of transferring large deeply-nested data structures between Rust and Javascript in SWC (a lengthy history of this effort here).

The data structures in question are Javascript ASTs. The types on Rust side are fairly simple and entirely owned types (no references), but they're large. They contain circular references (via Option<Box>).

The story so far is:

  1. Initially Rust types were serialized to JSON with serde. This was very slow.
  2. I switched to using RKYV to serialize to a binary format, and then generate a deserializer on JS side.

RKYV is much faster than serde_json but, due to the large size of these structures and the high degree of indirection - many many Boxes and Vecs - performance is still disappointing.

I think I have a better idea:

  1. Use an arena allocator on Rust side. All AST node types would be allocated inside a single buffer.
  2. Transfer this buffer to Javascript without any serialization (almost zero cost).
  3. Deserialize on JS side with a generated deserializer built for Rust's native memory layout.

i.e. Use Rust's native memory layout as a binary serialization format.

That's the background, now I'll get to the point...

How to get type layout information?

The tricky part is producing a schema for the type layouts.

I have a working prototype which can inspect type layouts at runtime and produce a schema: GitHub - overlookmotel/layout_inspect

However, it has some drawbacks:

  1. Code bloat: As it produces the schema at runtime, the schema-creation code is included in the binary. But it'll only be called once during the build process to generate a JS deserializer, and the schema will never be needed again - so this is wasteful.

  2. Difficulty obtaining alignment of unsized types. mem::align_of doesn't work for unsized types e.g. struct X { n: u64, s: str }.

  3. Niche optimization affects layout for e.g. Option<Box<T>>.

There are workarounds for the first 2, but the third is tricky. I could hard-code understanding of common patterns with niche optimizations into the code that generates the JS deserializer, but it's not really generalizable - it gets complicated with nested enums, and could break if Rust introduces more niche optimizations in future. While this is probably fine for my SWC purposes, I feel this no-serialization approach could have broader applicability for performant Rust-JS interop, so would like to make it work in the general case.

So... it'd be ideal to create the schema at build time and get the type layout information direct from the compiler. Is this possible?

What's required is:

  1. Memory layout of each type.
  2. Which types a type contains e.g. B and C in struct A { b: B, c: C }.
  3. Some method to output this information (e.g. to stdout) at compile time.

#[rustc_layout(debug)] fulfills (1) and (3). However, it has the unfortunate property of being an error, which brings compilation to a halt. So can only get info for a single type at a time.

Is there some way to get similarly detailed type info from the compiler without aborting compilation?

  • --print-type-sizes does not output information about niches.
  • The compiler's tracing API possibly could be used, but I can't figure out how to make it work.
  • A compiler plugin could probably work, but they are deprecated and will likely be removed in future.

Ralf Jung tantilizingly mentioned on his blog about rustc_layout "some time ago I wrote an awful hack for this based on rustc debug tracing". Ralf, if you're out there, can you share this awful hack please?!

Or does anyone else have any ideas how to approach this?

Are you comfortable working on the compiler? Changing the hard error of #[rustc_layout(debug)] to either a warning or nothing should be quite easy once you find where it is. You'd just need to built a custom tool chain as a one-off.

This doesn’t seem to be true, as far as I can tell. E.g. this shows the info for two types at once: Rust Playground

Thank loads both for very swift replies.

@steffahn You're right! I just assumed because it was a hard error, it'd halt compilation. Unfortunately, it does still halt compilation at the crate boundary, so can only get details of types in 1 crate at a time.

@jhpratt I don't know if I'm comfortable with that. I only started learning Rust a few months ago and I've rapidly descended into the deep end! But I'll give it a go. As this is intended for use in SWC, which isn't a project I'm a maintainer of, I don't think using a custom tool chain is a viable option. It'd need to be more than a one-off, as the types change quite regularly and the schema needs to be recreated each time to keep it up to date. I suppose in theory, type layout can also change from compilation to compilation, or be optimized differently due to how the types are used (the types aren't repr(C)).

However, do you have any guess as to whether the Rust people would consider changing it to a warning permanently if I submitted a PR?

One other question: Do you know of any way to get compiler to also output TypeIDs to stderr? That's what's required for linking up the type definitions - i.e. determining that error: layout_of(X) = ... output is referring to the same X as in struct Y { x: X }. Type names might not be unique.

That's fair! I assumed it was a one-off migration, not something done on a semi-regular basis.

I don't have the necessary permission to approve a pull request, but if I were to submit such a change I would expect it to be approved. I didn't even know the attribute existed, so it's certainly not used much.

That's what's required for linking up the type definitions

A fully-qualified, canonical path would also suffice and would be unique.

And it can't in general; consider dyn Trait, where the alignment (and thus offset) depends on the choice of concrete type.

The stable solution would be a derive generating static/const typeinfo descriptors, which can still be done at compiletime. If that typeinfo has a known #[repr(C)] layout, you can write the JS deser for that manually, then use (pass and) use that typeinfo for dynamically deserializing the JS side. (Though you'd probably want some amount of memoization since deeply nested types'll have a lot of type info to transfer.)