Multithreads slow down the performance unexpectly

I am doing some heavy calculation with Rust. There is unexpected gap on the performace between single-thread and multi-threads.
The following is the persu-do codes:

t!(let res = dil.dil[0].di(&pnl));
1.1906535s

t!(let res = dil.di(&pnl));
4.8076257s

dil.dil.len()
19

The dil is just a 19 duplicates of di. So in my thought, when my computer has 12-core cpus, if signle-thread take 1s, then multi thread would take 2s. But in fact it takes almost 5s.

Here is my computer config:

Julia Version 1.8.2
Commit 36034abf26 (2022-09-29 15:21 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 20 × 12th Gen Intel(R) Core(TM) i7-12700K
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, goldmont)
  Threads: 14 on 20 virtual cores
Environment:
  JULIA_NUM_THREADS = 14

Here is the multi-thread calculation codes:

impl Dil {
    pub fn di<T: Send + Sync>(&self, sig: &Box<dyn Sig<A = T> + Send + Sync>) -> Vec<T> {
       thread::scope(|scope| {
            let mut handles = Vec::new();
            for di in &self.dil {
                let handle = scope.spawn(move || sig.di(&di));
                handles.push(handle);
            }
            handles.into_iter().map(|x| x.join().unwrap()).collect()
       })
    }
}

So am I doing something wrong here?

I don't see any issue in the code you posted.

1 Like

Multithreading doesn't automatically speed up code and in some cases can slow it down. (For instance, if the threads contend over a shared resource, in particular the same cache line, this will not perform well.) Can you say a little more about computation you're doing?

3 Likes

The 12700k does have 12 cores, but it is a 8+4 configuration (meaning 8 performance cores which are also capable of hyperthreading plus 4 efficiency cores) giving you 8 fast threads + 8 hyper threads + 4 "efficiency" threads.
I wouldn't say its impossible that what you see is just the performance available.

2 Likes

Welcome to real world! Why do you think people invested literally insane amount of effort making single-thread performance better? i7-12700K doesn't have 12 cores. Rather it has 8 fast cores and 4 slow cores. Difference between fast cores and slow ones is more than 2x. That means that your program uses multithreading and slow cores for some of these calculations. This would easily explain going from 2s to 3s.

Add the fact that when you only have one active core CPU would pick some “blessed” core which is much faster then others but when all cores are loaded CPU would have to thermal-throttle everything. This means we should expect 4s or maybe even 5s… which is exactly what we observe.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.