Rust vs Go benchmark

I recently ported a codec for an in-house proprietary protocol from Go to Rust. I want to run a benchmark to compare the performance of the library written in Go with the one in Rust.

Benchmark code in Go:

func BenchmarkDecode(b *testing.B) {
	data := "....."
	for i := 0; i < b.N; i++ {
		codec.Decode([]byte(data))
	}
}

and run go test -bench=.

Benchmark code in Rust:

#[bench]
fn bench_decode(b: &mut Bencher) {
    let data = b".....;
    b.iter(|| black_box(CMDC_CODEC.decode(data)));
}

and run cargo +nightly bench

I ran the tests with my Ryzen 9 3950x inside an Ubuntu 20.04 VM. The results show that my Rust codec is twice as fast as the Go one, which made me very happy. However, when I ran the same test on another machine with a Ryzen 9 5950x, also inside an Ubuntu VM, I was surprised to see that the Go results were now faster than the Rust benchmark results. Since the 5950x is a faster CPU than the 3950x, the improved Go performance was expected, but I was puzzled to see the Rust benchmark results being twice as slow on the faster CPU.

Ryzen 9 3950x

Rust result:
test bench_decode ... bench:         215 ns/iter (+/- 11)
Go result:
BenchmarkDecode-16       1928305               615.0 ns/op

Ryzen 9 5950x

Rust result:
test bench_decode ... bench:         467 ns/iter (+/- 15)
Go result:
BenchmarkDecode-16       2882685               411.6 ns/op

I asked my friend to test the benchmarks on a few machines he has, and the Go benchmarks are pretty consistent with the theoretical speed of the CPUs. However, the Rust benchmarks seem random.

i7 13700k

Rust result:
test bench_decode ... bench:         125 ns/iter (+/- 16)
Go result:
BenchmarkDecode-8        2962131               454.8 ns/op

My question is, have I missed something in my Rust benchmark? Why do the results differ so much between different machines? How can I reliably compare the performance of the libraries written in Go and Rust?

1 Like

Try if RUSTFLAGS='-C target-cpu=native' cargo +nightly bench makes a difference. (While on the topic of build flags for better performance, I’ve read enabling LTO might make a difference, too.)

By default, rustc compiles into binaries that are more cross-platform, and thus cannot make use of potentially existing new instructions supported. On the other hand, I don’t know whether or not Go makes use of CPU-specific optimizations, so even if it makes a difference, one would need to make sure whether that’s a fair comparison.


Edit: Looking at the times, having it become slower on the supposedly newer&better CPU seems odd, maybe the problem is something entirely different :thinking:

2 Likes

Blind shot in the dark, but... this post reminded me of this problem with Zen3 processors.

9 Likes

Passed as a slice.

Passed by value?

data in the Rust benchmark is a byte-string literal, so it has type &[u8; _] which coerces to &[u8].

3 Likes

Thank you for the mention!


Hi @lwlee2608, I believe it could be related to a bug in AMD's hardware.

Please try switching to jemalloc. If the result is similar, we can rule out this possibility. Otherwise, it's likely the cause.

I think @Xuanwo and @DanielKeep are right about this. I have run the same tests on various hardware. Every time, my Rust codec is significantly faster than the Go codec, except on Zen3 processor hardware.

CPU OS Go (ns/op) Rust (ns/op) Diff (%)
AMD Ryzen 7 1700x Ubuntu 20.04 1141.0 414 63.72%
AMD Ryzen 9 3950x Ubuntu 20.04 637.9 217 65.98%
AMD Ryzen 5 5600x Ubuntu 20.04 425.1 479 -12.68%
AMD Ryzen 7 5700x Ubuntu 20.04 480.5 494 -2.81%
AMD Ryzen 9 5950x Ubuntu 20.04 411.6 467 -13.46%
Intel i7 8850U Ubuntu 20.04 1071.0 741 30.81%
Intel i7 13700k Ubuntu 22.04 454.8 125 72.52%
Intel i7 10750H Ubuntu 20.04 1448.0 223 84.60%
Intel i7 9750H MacOS Sonoma 14.2.1 580.1 364 37.25%
M1 Pro MacOS Sonoma 14.2.1 375.7 177 52.89%
Broadcom BCM2835 Ubuntu 20.04 4136.0 944 77.18%
8 Likes

I attempted to replicate the issue using the information provided by @lwlee2608.

jemalloc branch - GitHub - MATRIXXSoftware/rust-mdd at jemalloc
original branch - GitHub - MATRIXXSoftware/rust-mdd at v0.1.0
go codec - GitHub - MATRIXXSoftware/go-mdd

But find the result is expected on my setup:

Go:

cpu: AMD Ryzen 9 5950X 16-Core Processor            
BenchmarkDecode
BenchmarkDecode-32       1952908               637.9 ns/op
BenchmarkEncode
BenchmarkEncode-32       4258261               270.1 ns/op

Rust with original

test bench_decode ... bench:         160 ns/iter (+/- 2)
test bench_encode ... bench:         175 ns/iter (+/- 3)

Rust with jemalloc

test bench_decode ... bench:         159 ns/iter (+/- 2)
test bench_encode ... bench:         184 ns/iter (+/- 2)

My setup:

OS: Arch Linux x86_64 
Kernel: 6.6.8-zen1-1-zen 
Uptime: 14 hours, 15 mins 
Packages: 1298 (pacman) 
Shell: zsh 5.9 
Resolution: 3840x2160 
DE: Plasma 5.27.10 
WM: kwin 
WM Theme: Breeze 
Theme: Breeze Light [Plasma], Breeze [GTK2], Breeze-Dark [GTK3] 
Icons: [Plasma], breeze [GTK2/3] 
Terminal: java 
CPU: AMD Ryzen 9 5950X (32) @ 3.400GHz 
GPU: AMD ATI Radeon RX 6800/6800 XT / 6900 XT 
Memory: 35290MiB / 64157MiB

My rustc:

rustc 1.77.0-nightly (89e2160c4 2023-12-27)
binary: rustc
commit-hash: 89e2160c4ca5808657ed55392620ed1dbbce78d1
commit-date: 2023-12-27
host: x86_64-unknown-linux-gnu
release: 1.77.0-nightly
LLVM version: 17.0.6

I'm guessing it's caused by another issue.

@lwlee2608, would you like to run the same test on an archlinux with lastest kernel and microcode?

Hello @Xuanwo, I've installed the latest archlinux on VirtualBox and the result is the same as before.

OS: Arch Linux x86_64 
Host: VirtualBox 1.2 
Kernel: 6.6.8-arch1-1 
Uptime: 6 mins 
Packages: 712 (pacman) 
Shell: bash 5.2.21 
Resolution: 1280x800 
DE: GNOME 45.2 
WM: Mutter 
WM Theme: Adwaita 
Theme: Adwaita [GTK2/3] 
Icons: Adwaita [GTK2/3] 
Terminal: kgx 
CPU: AMD Ryzen 9 5950X (16) @ 3.400GHz 
GPU: 00:02.0 VMware SVGA II Adapter 
Memory: 1121MiB / 19528MiB 

Go:

cpu: AMD Ryzen 9 5950X 16-Core Processor            
BenchmarkDecode
BenchmarkDecode-16    	 2832524	       439.5 ns/op
BenchmarkEncode
BenchmarkEncode-16    	 6875265	       174.3 ns/op

Rust with original:

test bench_decode ... bench:         466 ns/iter (+/- 4)
test bench_encode ... bench:         486 ns/iter (+/- 3)

Rust with jemalloc:

test bench_decode ... bench:         477 ns/iter (+/- 11)
test bench_encode ... bench:         539 ns/iter (+/- 4)

My rust version:

rustc 1.77.0-nightly (3cdd004e5 2023-12-29)
binary: rustc
commit-hash: 3cdd004e55c869faa2b7b25efd3becf50346e7d6
commit-date: 2023-12-29
host: x86_64-unknown-linux-gnu
release: 1.77.0-nightly
LLVM version: 17.0.6

Perhaps it has something to do with virtual machine vs bare-metal? or maybe bios version?
Bios:

x570 AORUS MASTER rev1.2 
F32
ComboAM4v2PI 1.1.0.0

PS: I tried again with Kernel: 6.6.8-zen1-1-zen and the result is still the same.

So far, I'm not sure what's going on. Ideally, given that our CPU supports VT-x, the test results should be fairly consistent. It seems we need to dig deeper.

Can you run those benchmark in bare-metal? And how about using perf to capture the performance differences in those benchmarks? For instance, you could run: perf stat -d -d -d --repeat 20 ./x.

Is this something to do with mitigations for row-hammer, Spectre, Meltdown or other security issues that may affect Intel and AMD processors differently and be detrimental to performance.

I have rerun the benchmark tests on Ubuntu 22.04 bare metal on the same machine, and now Rust is much faster than Go, similar to what you are seeing on your Arch Linux bare metal.

OS: Ubuntu 22.04.3 LTS x86_64		
Host: X570 AORUS MASTER -CF		
Kernel: 6.2.0-39-generic		
Uptime: 1 min		
Packages: 1738 (dpkg), 11 (snap)		
Shell: bash 5.1.16		
Resolution: 3840x2160, 2560x1440		
DE: GNOME 42.9		
WM: Mutter		
WM Theme: Adwaita		
Theme: Yaru [GTK2/3]		
Icons: Yaru [GTK2/3]		
Terminal: gnome-terminal		
CPU: AMD Ryzen 9 5950X (32) @ 3.400GHz		
GPU: NVIDIA GeForce RTX 2080 Rev. A		
Memory: 1450MiB / 64233MiB		
CPU OS Kernel Go Rust Rust (jemalloc)
decode encode decode encode decode encode
AMD Ryzen 9 5950x VM Ubuntu 20.04.6 LTS 5.15.0-76-generic 423.4 168.9 473 492 480 559
AMD Ryzen 9 5950x VM Ubuntu 22.04.2 LTS 6.2.0-37-generic 410.9 168.3 467 496 477 539
AMD Ryzen 9 5950x Bare Metal Ubuntu 22.04.3 LTS 6.2.0-39-generic 505.7 259.2 149 162 154 165

We have now narrowed down that the Rust benchmark is slow on a Zen 3 processor running inside a VM.

Curiously, I also notice that Go is slower in bare-metal but that is out of the topic of this thread

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.