How to improve the timing of this program

ManuelCostanzo · September 4, 2020, 1:06pm

Dear,

I know that I made several posts about this related algorithm, but I have the problem here.

This is an example of my code, in my server takes 7ms. If I uncomment the line 69, the program increases to 18ms.

Can you think of a way not to suffer such a penalty when uncommenting that line?

Crates:

[dependencies]
jemallocator = "0.3.2"
rand = "0.7.3"

Using this locator I found better results, but I don't know if there will be a better one for this case

Thank you very much.

bjorn3 · September 4, 2020, 1:30pm

Are you using -Ctarget-cpu=native? On my system it decreased runtime from ~110ms to ~14ms when the line is commented. And from ~155ms to ~30ms when the line is uncommented. (on a single run for each, so may be affected a bit by noise)

bjorn3 · September 4, 2020, 1:35pm

According to perf only ~6% of your program is spent in floyd. The rest is populating the matrix. This means that perf stat is not very useful in this case.

ManuelCostanzo · September 4, 2020, 1:46pm

@bjorn3 Yes, I'm compiling with

 RUSTFLAGS='-C target-cpu=native' cargo build --release

I'm just taking floyd's method time. But actually what makes the times much worse are the assignments when I have 2 matrices.

bjorn3 · September 4, 2020, 2:11pm

I repeated the floyd() call 100 times. With the line commented perf stat gives:

-> TIME 1.334832287s

 Performance counter stats for 'target/release/manuel_constanzo':

        7298606027      instructions:u            #    1,64  insn per cycle           (83,15%)
        4444943282      cpu-cycles:u                                                  (83,37%)
          13141385      cache-misses              #   13,448 % of all cache refs      (83,37%)
          97716434      cache-references                                              (83,37%)
         307781226      branch-instructions                                           (83,36%)
           1030760      branch-misses             #    0,33% of all branches          (66,53%)

       1,756984452 seconds time elapsed

       1,634879000 seconds user
       0,116490000 seconds sys

with the line uncommented, it gives:

-> TIME 2.084396233s

 Performance counter stats for 'target/release/manuel_constanzo':

       10248449201      instructions:u            #    1,59  insn per cycle           (83,23%)
        6429701257      cpu-cycles:u                                                  (83,36%)
          27152737      cache-misses              #   10,479 % of all cache refs      (83,37%)
         259112585      cache-references                                              (83,38%)
         410791345      branch-instructions                                           (83,37%)
           1035576      branch-misses             #    0,25% of all branches          (66,52%)

       2,503109354 seconds time elapsed

       2,352298000 seconds user
       0,144509000 seconds sys

It seems to be a combination of more instructions to execute and less instructions per cycle.

kornel · September 4, 2020, 5:52pm

I suggest using #[bench] for timings. A single DIY Instant::now() check may not be accurate enough.

system · December 3, 2020, 5:52pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Announcing perf-event 0.4.1 announcements	1	439	March 21, 2020
Performance issue with C-array like computation (2 times worst than naive java) help	48	6138	January 12, 2023
Compile time help	11	446	March 5, 2024
Rayon is slower than serial algorithm	15	1971	October 30, 2020
Elapsed: super tiny crate for timing blocks of code announcements	4	950	January 12, 2023

How to improve the timing of this program

Related Topics