The Cortex-M4F FPU is only a single-precision FPU, so use of f64 will probably end up using software floating point. The same would happen for double in C (I don't think that's allowed to be a 32-bit float?), so you're likely comparing performance of software floating point libraries here.
Thank you so much for your feedback!
If I'm already comparing the performance of software floating point libraries,
I guess I should look into the libm crate implementation in more detail.
You were right! I re-compiled the C benchmark with -mfloat-abi=soft, and still got the exact same binary size. The execution times also stayed at the same level.