No, I'm not mixing them up
That was the first thing I double and tripple checked.
As you see I get the expected result once I add the option that uses the linear (and not the parallel version) of the simulation code.
My main dev. environment is a 2019 MacPro, Intel Xeon18-cores (36 threads) which might be why the overhead is killing the parallel version (when it really is just overhead for no gain).
HOWEVER If I try it on a Linux server I get the expected result (!!)
First the Linux debug version (notice the debug path)
./target/debug/gridcover -r 4 -w 500 -g 400 -s 0.5 -x 1 -y 1 -v 1 -c 99 -S 99 -C false
Coverage: 99.00% (198000/200000 cells covered) - Bounces: 737 - Sim-Time: 29:01:20
Simulation Results:
Simulation completed in : 00:21
Simulated elapsed time : 29:01:20
Coverage : 99.0% (198000 out of 200000 cells)
Distance traveled: : 104481.0 units
Number of bounces: : 737
Total Simulation Steps : 232,180 (Using step size = 0.45 units)
then the Linux release version
./target/release/gridcover -r 4 -w 500 -g 400 -s 0.5 -x 1 -y 1 -v 1 -c 99 -S 99 -C false
Coverage: 99.00% (198000/200000 cells covered) - Bounces: 737 - Sim-Time: 29:01:20
Simulation Results:
Simulation completed in : 00:05
Simulated elapsed time : 29:01:20
Coverage : 99.0% (198000 out of 200000 cells)
Distance traveled: : 104481.0 units
Number of bounces: : 737
Total Simulation Steps : 232,180 (Using step size = 0.45 units)
So this is something specific for a Mac !!
Just for completeness, the exact same output on my Mac Pro,
The Mac Pro debug version
./target/debug/gridcover -r 4 -w 500 -g 400 -s 0.5 -x 1 -y 1 -v 1 -c 99 -S 99 -C false
Coverage: 99.00% (198000/200000 cells covered) - Bounces: 737 - Sim-Time: 29:01:20
Simulation Results:
Simulation completed in : 00:28
Simulated elapsed time : 29:01:20
Coverage : 99.0% (198000 out of 200000 cells)
Distance traveled: : 104481.0 units
Number of bounces: : 737
Total Simulation Steps : 232,180 (Using step size = 0.45 units)
followed by the Mac Pro release version
./target/release/gridcover -r 4 -w 500 -g 400 -s 0.5 -x 1 -y 1 -v 1 -c 99 -S 99 -C false
Coverage: 99.00% (198000/200000 cells covered) - Bounces: 737 - Sim-Time: 29:01:20
Simulation Results:
Simulation completed in : 03:15
Simulated elapsed time : 29:01:20
Coverage : 99.0% (198000 out of 200000 cells)
Distance traveled: : 104481.0 units
Number of bounces: : 737
Total Simulation Steps : 232,180 (Using step size = 0.45 units)
So, to summarize:
Enabling parallel processing (par_iter
)
|
debug |
release |
Linux |
00:21 |
00:05 |
Mac Pro |
00:28 |
03:15 |
Mac M1 |
00:12 |
00:05 |
Given that both the Mac Pro and Linux uses Intel Xeon CPU this huge, huge, difference is astonishing.
Disabling parallell processing ( no par_iter
)
|
debug |
release |
Linux |
00:14 |
00:02 |
Mac Pro |
00:18 |
00:02 |
Mac M1 |
00:12 |
00:00 |
Sidenote: The compilation time on Linux is ~40% faster!
I guess things like L1 cache trashing could play a role but it doesn't explain why the Linux server doesn't exhibit this behavior. Fully explaning this would be some serious detective work that would take more time than I have.
Edit: Added Row for Mac M1