Hi,
Some background
I maintain a Rust library called Frunk. It's currently a grab bag of functional programming constructs like HList, Monoid, Semigroup, etc. Among these is LabelledGeneric
, which, when mixed with HList
allows one to do boilerplate-free transforms between structs.
How conversion works
To give a quick overview of what happens (code-wise, anyway) when you transform one struct (Struct A, let's say) into another (Struct B):
- Struct A is turned into a "type-labelled" HList (names of fields are included on the type level); let's call this LabelledRep A. This is done by making use of an implementation of the
LabelledGeneric
trait that is trivially generated using a custom derivation. - LabelledRep A is "rearranged"; or "sculpted" to fit the LabelledGeneric Representation of Struct B using another trait that is implemented for HLists. This is done recursively on the stack and largely driven by the compiler at compile time. Let's call the end-result of this LabelledRep B
- LabelledRep B (remember, it's an HList), is deconstructed, and its values are used to construct Struct B.
Issue
In an attempt to maximise the chances that all of this is a zero-cost abstraction, I stuck to using the stack when writing it, and it largely works to that end. In normal cases, there is zero difference between using LabelledGeneric::transform_from
vs From::from
(as you'll see below). This by the way, is extremely impressive and speaks volumes about Rust's compiler and its integration with LLVM; I can't say any more because I really have no idea how it manages to optimise away all those steps, but I am really impressed and happy with it.
However, when I try to push things to its limits, I get some weird results when benchmarking.
To benchmark, I wanted to use the scenario where LabelledGeneric-based transformation would be at its slowest. Due to the way "sculpting" works, this is when one struct's fields are in reverse order of the other e.g. BigStruct { a: i8, b: String, ... }
to BigStructReverse { z:i8, y: String, ..}
Comparing LabelledGeneric-backed transform_from
vs hand-written From
when the struct has 24 fields, the results look quite favourable for transform_from
:
test big_from_24fields ... bench: 116 ns/iter (+/- 19)
test big_transform_from_24fields ... bench: 106 ns/iter (+/- 17)
But once we start doing it with a struct containing 25 fields, performance falls off a cliff for transform_from
!!:
test big_from_25fields ... bench: 133 ns/iter (+/- 68)
test big_transform_from_25fields ... bench: 4,919 ns/iter (+/- 1,400)
Now, for the real kicker, if I have both of the above benchmarks enabled, big_transform_from_24fields
also takes a nosedive!!
test big_from_24fields ... bench: 110 ns/iter (+/- 50)
test big_from_25fields ... bench: 131 ns/iter (+/- 5)
test big_transform_from_24fields ... bench: 4,594 ns/iter (+/- 180) <-- !!! This used to be 106ns!
test big_transform_from_25fields ... bench: 4,682 ns/iter (+/- 662)
At the moment I really don't know why this is the case. Maybe this is something that someone with knowledge on looking at compiled Rust code or the compiler can take a look at. I'm curious but sadly don't have the skills
Benchmark code here
Note that I obtained these results from nightly Rust from ~ 3 days ago.
Ask
Like I said, I'm really happy that the Rust compiler + LLVM are seemingly able to make LabelledGeneric::transform_from
a zero-cost abstraction. I also understand that for the most part nobody is going to be using 24+ fields in their structs.
I'm just really curious about these results and am open to pointers on how/where to start investigating. My current thoughts are:
- Maybe my benchmarks are bogus (written wrong)?
- Maybe this is related to the new optimisation fuel stuff checked into nightly (compiler ran out of optimisation fuel at 24+ fields)?
Thanks for reading, and any help is appreciated.