How to improve the timing of this program


I know that I made several posts about this related algorithm, but I have the problem here.

This is an example of my code, in my server takes 7ms. If I uncomment the line 69, the program increases to 18ms.

Can you think of a way not to suffer such a penalty when uncommenting that line?



jemallocator = "0.3.2"
rand = "0.7.3"

Using this locator I found better results, but I don't know if there will be a better one for this case

Thank you very much.

Are you using -Ctarget-cpu=native? On my system it decreased runtime from ~110ms to ~14ms when the line is commented. And from ~155ms to ~30ms when the line is uncommented. (on a single run for each, so may be affected a bit by noise)

According to perf only ~6% of your program is spent in floyd. The rest is populating the matrix. This means that perf stat is not very useful in this case.

@bjorn3 Yes, I'm compiling with

 RUSTFLAGS='-C target-cpu=native' cargo build --release

I'm just taking floyd's method time. But actually what makes the times much worse are the assignments when I have 2 matrices.

I repeated the floyd() call 100 times. With the line commented perf stat gives:

-> TIME 1.334832287s

 Performance counter stats for 'target/release/manuel_constanzo':

        7298606027      instructions:u            #    1,64  insn per cycle           (83,15%)
        4444943282      cpu-cycles:u                                                  (83,37%)
          13141385      cache-misses              #   13,448 % of all cache refs      (83,37%)
          97716434      cache-references                                              (83,37%)
         307781226      branch-instructions                                           (83,36%)
           1030760      branch-misses             #    0,33% of all branches          (66,53%)

       1,756984452 seconds time elapsed

       1,634879000 seconds user
       0,116490000 seconds sys

with the line uncommented, it gives:

-> TIME 2.084396233s

 Performance counter stats for 'target/release/manuel_constanzo':

       10248449201      instructions:u            #    1,59  insn per cycle           (83,23%)
        6429701257      cpu-cycles:u                                                  (83,36%)
          27152737      cache-misses              #   10,479 % of all cache refs      (83,37%)
         259112585      cache-references                                              (83,38%)
         410791345      branch-instructions                                           (83,37%)
           1035576      branch-misses             #    0,25% of all branches          (66,52%)

       2,503109354 seconds time elapsed

       2,352298000 seconds user
       0,144509000 seconds sys

It seems to be a combination of more instructions to execute and less instructions per cycle.

1 Like

I suggest using #[bench] for timings. A single DIY Instant::now() check may not be accurate enough.