Posting on here before IRLO, since I may be misunderstanding something.
It seems like when a system allocator is being used which is not jemalloc,
std::alloc::System::alloc does not necessarily return 16-aligned pointers on Unix x86_64 when requested by the caller. This would be because the implementation in
std::sys::Unix::alloc seems to assume malloc will return pointers aligned at least to
std::sys_common::alloc::MIN_ALIGN (which is 16 on x86_64), with some rationale based on jemalloc. In general, unless I'm mistaken, malloc on x86_64 can return 8-aligned pointers. In this playground you will notice in the assembly that malloc is incorrectly being used to allocate a 16-aligned pointer instead of memalign.
I'm not sure about the other architectures, but they probably suffer the same bug as well. I'm also not sure if there's some reason malloc alignment would definitely be at least 16 on unix-like, but I couldn't find anything.