You need to profile the code for sure, but I guess it's because the former have 2 intermediate allocations and copies between them while the latter is zero-allocation.
Obligatory: are you compiling with --release? Without optimizations things will be slow for irrelevant reasons.
Are you measuring it with bencher or criterion? A single timer.elapsed() is very skewed due to caches, CPU frequency boosts, multitasking, and compiler potentially reordering or deleting code.