What I'm trying to do
I am working on improving the performance of transferring large deeply-nested data structures between Rust and Javascript in SWC (a lengthy history of this effort here).
The data structures in question are Javascript ASTs. The types on Rust side are fairly simple and entirely owned types (no references), but they're large. They contain circular references (via Option<Box>
).
The story so far is:
- Initially Rust types were serialized to JSON with serde. This was very slow.
- I switched to using RKYV to serialize to a binary format, and then generate a deserializer on JS side.
RKYV is much faster than serde_json but, due to the large size of these structures and the high degree of indirection - many many Box
es and Vec
s - performance is still disappointing.
I think I have a better idea:
- Use an arena allocator on Rust side. All AST node types would be allocated inside a single buffer.
- Transfer this buffer to Javascript without any serialization (almost zero cost).
- Deserialize on JS side with a generated deserializer built for Rust's native memory layout.
i.e. Use Rust's native memory layout as a binary serialization format.
That's the background, now I'll get to the point...
How to get type layout information?
The tricky part is producing a schema for the type layouts.
I have a working prototype which can inspect type layouts at runtime and produce a schema: GitHub - overlookmotel/layout_inspect
However, it has some drawbacks:
-
Code bloat: As it produces the schema at runtime, the schema-creation code is included in the binary. But it'll only be called once during the build process to generate a JS deserializer, and the schema will never be needed again - so this is wasteful.
-
Difficulty obtaining alignment of unsized types.
mem::align_of
doesn't work for unsized types e.g.struct X { n: u64, s: str }
. -
Niche optimization affects layout for e.g.
Option<Box<T>>
.
There are workarounds for the first 2, but the third is tricky. I could hard-code understanding of common patterns with niche optimizations into the code that generates the JS deserializer, but it's not really generalizable - it gets complicated with nested enums, and could break if Rust introduces more niche optimizations in future. While this is probably fine for my SWC purposes, I feel this no-serialization approach could have broader applicability for performant Rust-JS interop, so would like to make it work in the general case.
So... it'd be ideal to create the schema at build time and get the type layout information direct from the compiler. Is this possible?
What's required is:
- Memory layout of each type.
- Which types a type contains e.g.
B
andC
instruct A { b: B, c: C }
. - Some method to output this information (e.g. to stdout) at compile time.
#[rustc_layout(debug)]
fulfills (1) and (3). However, it has the unfortunate property of being an error, which brings compilation to a halt. So can only get info for a single type at a time.
Is there some way to get similarly detailed type info from the compiler without aborting compilation?
-
--print-type-sizes
does not output information about niches. - The compiler's tracing API possibly could be used, but I can't figure out how to make it work.
- A compiler plugin could probably work, but they are deprecated and will likely be removed in future.
Ralf Jung tantilizingly mentioned on his blog about rustc_layout "some time ago I wrote an awful hack for this based on rustc debug tracing". Ralf, if you're out there, can you share this awful hack please?!
Or does anyone else have any ideas how to approach this?