Help benchmarking new lock implementation

I wrote a new read-lock implementation that's optimized for uncontended multi-reader or single writer usecases. It's really meant for large in-memory data structures that are read-mostly, with occasional updates. As such, it should be benchmarked with big-iron type hardware. I'm also interested to see how it does on non-intel hardware (both correctness and performance). Locking is also a very OS dependent thing (threads, parking, etc), so having data on that would be awesome too.

Unfortunately, I don't have big-iron hardware. Something like a 32+ core Xeon setup, or even an AMD threadripper would be interesting to see.

So I'm asking for help from the community. If you have about 8 minutes and want to help, please follow these directions:

git clone https://github.com/mohrezaei/widerwlock
cd widerwlock
cargo test
cargo test --release
cargo +nightly bench

And post your results in this thread. Please note your CPU, OS and rust toolchain. Here are my results:

intel i7-7700K, Windows 10, nightly-x86_64-pc-windows-msvc
test bench_2_reader_grwl                ... bench:  12,035,099 ns/iter (+/- 1,851,445)
test bench_2_reader_pk                  ... bench:  28,803,382 ns/iter (+/- 423,200)
test bench_2_reader_std                 ... bench:  28,386,442 ns/iter (+/- 1,316,308)
test bench_4_reader_grwl                ... bench:  14,991,882 ns/iter (+/- 5,051,984)
test bench_4_reader_pk                  ... bench:  58,448,153 ns/iter (+/- 837,085)
test bench_4_reader_std                 ... bench:  56,011,028 ns/iter (+/- 1,106,178)
test bench_8_reader_grwl                ... bench:  21,986,720 ns/iter (+/- 4,930,849)
test bench_8_reader_pk                  ... bench: 128,697,642 ns/iter (+/- 4,848,127)
test bench_8_reader_std                 ... bench: 108,676,567 ns/iter (+/- 2,381,390)
test bench_mixed_hundreth_work10_1_grwl ... bench:  26,082,803 ns/iter (+/- 1,373,453)
test bench_mixed_hundreth_work10_1_pk   ... bench:  20,867,510 ns/iter (+/- 1,788,476)
test bench_mixed_hundreth_work10_1_std  ... bench:  23,517,335 ns/iter (+/- 1,168,930)
test bench_mixed_hundreth_work10_4_grwl ... bench:  70,482,765 ns/iter (+/- 5,733,574)
test bench_mixed_hundreth_work10_4_pk   ... bench:  94,416,212 ns/iter (+/- 5,646,173)
test bench_mixed_hundreth_work10_4_std  ... bench: 101,117,698 ns/iter (+/- 3,081,084)
test bench_mixed_tenth_work100_4_grwl   ... bench:  29,531,374 ns/iter (+/- 1,187,823)
test bench_mixed_tenth_work100_4_pk     ... bench:  32,086,846 ns/iter (+/- 1,075,212)
test bench_mixed_tenth_work100_4_std    ... bench:  40,456,188 ns/iter (+/- 704,694)
test bench_mixed_tenth_work10_1_grwl    ... bench:  27,482,565 ns/iter (+/- 2,266,590)
test bench_mixed_tenth_work10_1_pk      ... bench:  20,712,666 ns/iter (+/- 184,322)
test bench_mixed_tenth_work10_1_std     ... bench:  23,800,789 ns/iter (+/- 584,405)
test bench_mixed_tenth_work10_4_grwl    ... bench: 220,569,891 ns/iter (+/- 6,972,154)
test bench_mixed_tenth_work10_4_pk      ... bench:  94,027,579 ns/iter (+/- 4,944,653)
test bench_mixed_tenth_work10_4_std     ... bench: 102,399,700 ns/iter (+/- 4,858,920)
test bench_uncontended_mutex            ... bench:  15,171,497 ns/iter (+/- 118,918)
test bench_uncontended_read_grwl        ... bench:   9,596,682 ns/iter (+/- 126,181)
test bench_uncontended_read_pk          ... bench:  14,058,431 ns/iter (+/- 126,530)
test bench_uncontended_read_std         ... bench:  13,382,540 ns/iter (+/- 146,634)
test bench_uncontended_write_grwl       ... bench:  10,114,534 ns/iter (+/- 531,415)
test bench_uncontended_write_pk         ... bench:   8,028,946 ns/iter (+/- 38,759)
test bench_uncontended_write_std        ... bench:  14,276,519 ns/iter (+/- 163,786)
test bench_work10                       ... bench:          10 ns/iter (+/- 0)
test bench_work100                      ... bench:         102 ns/iter (+/- 3)

Thanks a lot :grinning: