Why Rust std::thread ::spawn faster then std::thread in c++?

Im saw benchmarking when somebody create 100000 thread in Rust and C++(he use also directly winapi function CreateThread), of course Rust faster, but why? In Rust and C++ the same kernel function CreateThread.


There shouldn't be a meaningful difference in spawning OS threads, because the OS is doing majority of the work.

It could be a flaw in the benchmark, e.g. use of imprecise timers or build without optimization. It could be faster if it was comparing threads to tokio::spawn that doesn't spawn OS threads, only queues async tasks for execution.

But such microbenchmarks alone aren't very meaningful as language vs language comparison, because they look only at a tiny fraction of code that may have some accidental overhead that isn't representative of the language as a whole.


Bro u wrong (my result: 26s its Rust and 2 minuts C++). U can check and write result u got.

#include <thread>
#include <functional>
#include <chrono>
#include <iostream>
#include <vector>
#include <Windows.h>

DWORD WINAPI MyThreadFunction(LPVOID lpParam)
    int num = *(int*)lpParam;
    printf("%d\n", num);
    return 0;

int main()
    std::vector<HANDLE> threads;

    auto start = std::chrono::high_resolution_clock::now();

    HANDLE hThread;

    // Wait for th

    for (uint32_t i = 0; i < 100000; i++)
        DWORD dwThreadId;
        int ni = i;
        hThread = CreateThread(NULL, 0, MyThreadFunction, &ni, 0, &dwThreadId);

    for (HANDLE handle : threads)
        WaitForSingleObject(hThread, INFINITE);

    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
    std::cout << "Time took: " << duration.count() << " microseconds" << std::endl;



use std::thread;
use std::time::Instant;

fn main() 
    let start = Instant::now();
    let mut threads: Vec<thread::JoinHandle<()>> = Vec::new();

    for i in 0..100_000
        threads.push( std::thread::spawn(move ||{
            println!("thread: {}", i);

    for thread in threads
    let end = Instant::now();
    println!("time: {:?}", end.duration_since(start));
1 Like

You're most likely measuring speed of locks around stdout used by printf.


Im use std::cout, puts, putc and have like result

1 Like

Don't print anything to eliminate this variable.

Use a profiler to see where the code spends time.

1 Like

Not bad, how u do this in Rust and how i create action in thread?

1 Like

The C++ code here looks a bit off to me - you never push anything to threads.

1 Like

I push pointer int ni in to new thread

I mean you declare std::vector<HANDLE> threads; and later you loop over it - for (HANDLE handle : threads) - but you never put anything in it.

And that loop waits on hThread instead of handle, which means every time (if it runs at all) it's waiting on the last thread you created in the loop above instead of each different thread you created.


Ok, but its doesnt make sense, Rust allready end loop while C++ creating threads.

1 Like

Here's the windows part of the implementation of std::thread: it doesn't do much more than CreateThread and install a stack overflow handler:

If you're seeing any performance difference from direct CreateThread calls in C++, you're probably measuring something unrelated, as mentioned.

That said, I'm surprised that printf() would be noticably slower than println!()... Perhaps it has some first time per thread init on msvc?

That makes perfect sense, because printf parses the format string at runtime, while println does the same at compile time, and the resulting code can be further optimized.

1 Like

That's likely (but not definitely!) measurably slower, but not noticably (eg to a human) slower: and certainly not 30s to 120s 4× difference!

printf() also has the overhead of converting from whatever the current character set is into UTF-16 so that it can call WriteConsoleW, while Rust knows it's starting from UTF-8. I still wouldn't expect that to make that much difference, but it would be interesting to see whether wprintf() or std::wcout was faster.

Alternatively, maybe Windows printf() doesn't do line-buffering?

It is sometimes fun to play with benchmarking, but most of time you should only benchmark different variations of your code. For example how fast is your code if you use arrays and how fast it is if you use vectors. On other things benchmarks are often useless.
Especially if you compare different programming languages. Because programming language don't have speed. Speed come from hardware. How many operations processor can execute in some time and how fast data is transferred between RAM and processor or HDD/SDD.
Your code or compiler of programming language determine how many operations here are to execute your code. And from this come difference in time it takes to execute two different programs. But this most of time come from how well programmer knows programming language and algorithms used in code.
For example in your code you not only benchmarking how fast threads are created, but also how fast printf prints to terminal compared to println! and how fast is std::vector and Vec::new().
You also should use appropriate compile settings to optimize compiled code. Here may be big differences in speed for debug version and release version just because in debug version std::vector may take more operations compared to Vec::new(), but it may take less operations on release version compared to Vec::new(). Here may be other settings too.
Also, if one thing is faster on one language, other thing may be faster on opposite language.
So, understanding programming language, compiler and settings of compiler to know end result is very important. If you don't know them all benchmarks become just meaningless fun things.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.