Could this maybe explained by one of two issues?
- This topic finds that jemalloc adds extra-padding to allocations, making reads faster (as far as I understood the topic)
- This excellent talk gives examples of spurious performance regressions and presents this tool to find them: GitHub - ccurtsinger/stabilizer: Stabilizer: Rigorous Performance Evaluation