I am currently experimenting with a generic functional approach to deserialize any type of struct by passing a function-array to a generic deserialize function to deserialize each struct member.
I have always compared the optimized assembly of the generic function with a hard coded variant (with cargo asm) and could not see any difference in the lib itself. In my benchmarks with criterion, however, the functional variant was always ~110% slower.
So I investigated further and found out that it makes a difference if I call a generic function in a lib or if I call it from a main.rs. I implemented an example where I call both functions from a main. One time calling the generic itself, one time calling a non-generic lib function that implement the generic lib fn. Then I disassembled the binary and it showed (like the benches) that the version implemented in the lib is way more optimised.
For me it would have been explained by missing inlining, but it looks like the closures are also inlined. But not optimized in the whole like in the lib version.
In general I would be interested how a generic function from a lib is compiled? Since it can only be compiled when it is implemented in main, I would have thought that both functions are optimized the same way.
If you want to try it out yourself, you can find a repo with the code and description to get the asm here:
https://github.com/tjensen42/rust-lib-closure-test
// lib.rs
use bytes::Buf;
#[derive(Debug, Default)]
pub struct Color {
pub r: u8,
pub g: u8,
pub b: u8,
}
// Deserialize some struct with 3 Fields
#[inline(never)]
pub fn deserialize_struct<S, F>(reader: &mut &[u8], data: &mut S, func: &[F; 3])
where
F: Fn(&mut &[u8], &mut S),
{
if reader.remaining() >= 3 {
func[0](reader, data);
func[1](reader, data);
func[2](reader, data);
}
}
// Implement the generic function in the library
pub fn deserialize_color_generic(reader: &mut &[u8], color: &mut Color) {
deserialize_struct(reader, color, &DESER_COLOR_CLOSURE_LIB)
}
// Closure array to deserialize a Color struct
pub const DESER_COLOR_CLOSURE_LIB: [fn(&mut &[u8], &mut Color); 3] = [
|r, s| s.r = r.get_u8(),
|r, s| s.g = r.get_u8(),
|r, s| s.b = r.get_u8(),
];
// main.rs
use std::hint::black_box;
use bytes::Buf;
use rust_lib_closure_test::{deserialize_color_generic, deserialize_struct, Color};
pub fn main() {
let buf: Vec<u8> = Vec::from([0x01, 0x02, 0x03]);
// Call the generic function indirect (implemented in lib)
let mut color = Color::default();
let cursor = &mut buf.as_slice();
deserialize_color_generic(black_box(cursor), &mut color);
println!("color: {:?}", color);
// Call the generic function direct
let mut color = Color::default();
let cursor = &mut buf.as_slice();
deserialize_struct(black_box(cursor), &mut color, &DESER_COLOR_CLOSURE_BIN);
println!("color: {:?}", color);
}
const DESER_COLOR_CLOSURE_BIN: [fn(&mut &[u8], &mut Color); 3] = [
|r, s| s.r = r.get_u8(),
|r, s| s.g = r.get_u8(),
|r, s| s.b = r.get_u8(),
];
; ASM of deserialize_struct called from inside lib.rs
0000000000008c90 <_ZN21rust_lib_closure_test18deserialize_struct17h7cac8bcc2fad5118E>:
8c90: 48 8b 47 08 mov 0x8(%rdi),%rax
8c94: 48 83 f8 02 cmp $0x2,%rax
8c98: 76 25 jbe 8cbf <_ZN21rust_lib_closure_test18deserialize_struct17h7cac8bcc2fad5118E+0x2f>
8c9a: 48 8b 0f mov (%rdi),%rcx
8c9d: 0f b6 11 movzbl (%rcx),%edx
8ca0: 88 16 mov %dl,(%rsi)
8ca2: 0f b6 51 01 movzbl 0x1(%rcx),%edx
8ca6: 88 56 01 mov %dl,0x1(%rsi)
8ca9: 0f b6 51 02 movzbl 0x2(%rcx),%edx
8cad: 48 83 c1 03 add $0x3,%rcx
8cb1: 48 83 c0 fd add $0xfffffffffffffffd,%rax
8cb5: 48 89 0f mov %rcx,(%rdi)
8cb8: 48 89 47 08 mov %rax,0x8(%rdi)
8cbc: 88 56 02 mov %dl,0x2(%rsi)
8cbf: c3 ret
; ASM of deserialize_struct called from main.rs
0000000000008b90 <_ZN21rust_lib_closure_test18deserialize_struct17h78300619e9e5bdc1E>:
8b90: 41 57 push %r15
8b92: 41 56 push %r14
8b94: 53 push %rbx
8b95: 48 83 7f 08 02 cmpq $0x2,0x8(%rdi)
8b9a: 76 26 jbe 8bc2 <_ZN21rust_lib_closure_test18deserialize_struct17h78300619e9e5bdc1E+0x32>
8b9c: 49 89 d7 mov %rdx,%r15
8b9f: 49 89 f6 mov %rsi,%r14
8ba2: 48 89 fb mov %rdi,%rbx
8ba5: ff 12 call *(%rdx)
8ba7: 48 89 df mov %rbx,%rdi
8baa: 4c 89 f6 mov %r14,%rsi
8bad: 41 ff 57 08 call *0x8(%r15)
8bb1: 49 8b 47 10 mov 0x10(%r15),%rax
8bb5: 48 89 df mov %rbx,%rdi
8bb8: 4c 89 f6 mov %r14,%rsi
8bbb: 5b pop %rbx
8bbc: 41 5e pop %r14
8bbe: 41 5f pop %r15
8bc0: ff e0 jmp *%rax
8bc2: 5b pop %rbx
8bc3: 41 5e pop %r14
8bc5: 41 5f pop %r15
8bc7: c3 ret
8bc8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
8bcf: 00
Full source code: https://github.com/tjensen42/rust-lib-closure-test
I also tried to use the "DESER_COLOR_CLOSURE_LIB" in the main(), but this does not make any difference.