I'm just chiming in to report an observation with compiling Rust applications using both the `glibc`

(default) and `musl`

targets, as well as `jemalloc`

versus `system-alloc`

. In all the scenarios I've tried, musl-compiled Rust binaries are significantly faster than their glibc counterparts, so it's worth investigating this in your own projects. I've seen speedups ranging from 50% to 1000% faster.

I managed to drastically boost the performance of my Parallel application by switching to musl and ditching `jemalloc`

for the system allocator, reducing memory consumption and CPU cycles in half. I'm often times finding `jemalloc`

to be more of a nuisance to my performance in general, but it takes a nightly compiler to get rid of it, which is a bit silly. Anyway, here's some interesting performance metrics from my Linux box for a real-world application, Parallel. Benchmarks are ordered from slowest to fastest.

```
seq 1 10000 | perf stat target/release/parallel echo > /dev/null
seq 1 10000 | perf stat target/x86_64-unknown-linux-musl/release/parallel echo > /dev/null
```

# Parallel with Jemalloc + Glibc (3368KB Max RSS)

```
10957.993473 task-clock:u (msec) # 1.673 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
1,487,686 page-faults:u # 0.136 M/sec
3,953,268,885 cycles:u # 0.361 GHz (85.95%)
0 stalled-cycles-frontend:u (85.68%)
0 stalled-cycles-backend:u # 0.00% backend cycles idle (85.21%)
1,451,526,963 instructions:u # 0.37 insn per cycle (84.79%)
325,691,822 branches:u # 29.722 M/sec (84.29%)
27,016,217 branch-misses:u # 8.30% of all branches (84.67%)
6.550195345 seconds time elapsed
```

# Parallel with System Alloc + Glibc (3228KB Max RSS)

```
8813.700737 task-clock:u (msec) # 1.604 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
1,206,385 page-faults:u # 0.137 M/sec
3,227,954,900 cycles:u # 0.366 GHz (86.91%)
0 stalled-cycles-frontend:u (86.59%)
0 stalled-cycles-backend:u # 0.00% backend cycles idle (85.11%)
1,176,187,072 instructions:u # 0.36 insn per cycle (87.05%)
257,953,651 branches:u # 29.267 M/sec (88.35%)
25,232,814 branch-misses:u # 9.78% of all branches (86.67%)
5.494453770 seconds time elapsed
```

# Parallel with Jemalloc + Musl (1768KB Max RSS)

```
7724.722519 task-clock:u (msec) # 1.594 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
1,210,474 page-faults:u # 0.157 M/sec
3,353,744,654 cycles:u # 0.434 GHz (88.37%)
0 stalled-cycles-frontend:u (88.27%)
0 stalled-cycles-backend:u # 0.00% backend cycles idle (87.71%)
1,323,967,181 instructions:u # 0.39 insn per cycle (85.28%)
281,211,163 branches:u # 36.404 M/sec (85.80%)
24,024,922 branch-misses:u # 8.54% of all branches (87.68%)
4.844914953 seconds time elapsed
```

# Parallel with System Alloc + Musl (1768KB RSS Max)

```
4757.338202 task-clock:u (msec) # 1.329 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
757,191 page-faults:u # 0.159 M/sec
2,306,342,779 cycles:u # 0.485 GHz (90.86%)
0 stalled-cycles-frontend:u (92.85%)
0 stalled-cycles-backend:u # 0.00% backend cycles idle (89.90%)
1,150,291,731 instructions:u # 0.50 insn per cycle (91.26%)
233,091,489 branches:u # 48.996 M/sec (89.56%)
20,072,159 branch-misses:u # 8.61% of all branches (89.71%)
3.580532601 seconds time elapsed
```

I might add more benchmarks from other applications, but I'd be interested in seeing performance comparisons that anyone else might have with their applications.