Dear,
I know that I made several posts about this related algorithm, but I have the problem here.
This is an example of my code, in my server takes 7ms. If I uncomment the line 69, the program increases to 18ms.
Can you think of a way not to suffer such a penalty when uncommenting that line?
Playground
Crates :
[dependencies]
jemallocator = "0.3.2"
rand = "0.7.3"
Using this locator I found better results, but I don't know if there will be a better one for this case
Thank you very much.
bjorn3
September 4, 2020, 1:30pm
2
Are you using -Ctarget-cpu=native
? On my system it decreased runtime from ~110ms to ~14ms when the line is commented. And from ~155ms to ~30ms when the line is uncommented. (on a single run for each, so may be affected a bit by noise)
bjorn3
September 4, 2020, 1:35pm
3
According to perf only ~6% of your program is spent in floyd. The rest is populating the matrix. This means that perf stat
is not very useful in this case.
@bjorn3 Yes, I'm compiling with
RUSTFLAGS='-C target-cpu=native' cargo build --release
I'm just taking floyd's method time. But actually what makes the times much worse are the assignments when I have 2 matrices.
bjorn3
September 4, 2020, 2:11pm
5
I repeated the floyd()
call 100 times. With the line commented perf stat
gives:
-> TIME 1.334832287s
Performance counter stats for 'target/release/manuel_constanzo':
7298606027 instructions:u # 1,64 insn per cycle (83,15%)
4444943282 cpu-cycles:u (83,37%)
13141385 cache-misses # 13,448 % of all cache refs (83,37%)
97716434 cache-references (83,37%)
307781226 branch-instructions (83,36%)
1030760 branch-misses # 0,33% of all branches (66,53%)
1,756984452 seconds time elapsed
1,634879000 seconds user
0,116490000 seconds sys
with the line uncommented, it gives:
-> TIME 2.084396233s
Performance counter stats for 'target/release/manuel_constanzo':
10248449201 instructions:u # 1,59 insn per cycle (83,23%)
6429701257 cpu-cycles:u (83,36%)
27152737 cache-misses # 10,479 % of all cache refs (83,37%)
259112585 cache-references (83,38%)
410791345 branch-instructions (83,37%)
1035576 branch-misses # 0,25% of all branches (66,52%)
2,503109354 seconds time elapsed
2,352298000 seconds user
0,144509000 seconds sys
It seems to be a combination of more instructions to execute and less instructions per cycle.
1 Like
kornel
September 4, 2020, 5:52pm
6
I suggest using #[bench]
for timings. A single DIY Instant::now()
check may not be accurate enough.
system
Closed
December 3, 2020, 5:52pm
7
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.