BufWriter::flush_buf has terrible performance in debug mode due to using a
Vec iterator it seems. Looking at the assembly, there seem to be quite a bit of stack accesses that the mem2reg optimization pass would cleanup.
for example could become just:
cg_clif is already avoiding these stack accesses. It produced:
rex push %rbp
rex mov $0x1,%eax
rex pop %rbp
This is still not ideal as it has a prologue and epilogue and an unnecessary multiply instruction. I fixed the unnecessary multiply in https://github.com/bjorn3/rustc_codegen_cranelift/commit/96c4542dc3c7001d5a28b05d067701f2173e9eb4 improving perf by ~1%, but most of the overhead seems to come from the prologue and epilogue emission.
File => 71.32ms
BufWriter => 6.30s
File => 958.81ms
BufWriter => 6.79s
cg_clif is slower on the
File case here as the standard library is not optimized unlike with cg_llvm.