Why Rust std::thread ::spawn faster then std::thread in c++?

caratovchanin · May 28, 2023, 4:08pm

Im saw benchmarking when somebody create 100000 thread in Rust and C++(he use also directly winapi function CreateThread), of course Rust faster, but why? In Rust and C++ the same kernel function CreateThread.

kornel · May 28, 2023, 4:29pm

There shouldn't be a meaningful difference in spawning OS threads, because the OS is doing majority of the work.

It could be a flaw in the benchmark, e.g. use of imprecise timers or build without optimization. It could be faster if it was comparing threads to tokio::spawn that doesn't spawn OS threads, only queues async tasks for execution.

But such microbenchmarks alone aren't very meaningful as language vs language comparison, because they look only at a tiny fraction of code that may have some accidental overhead that isn't representative of the language as a whole.

caratovchanin · May 28, 2023, 4:34pm

Bro u wrong (my result: 26s its Rust and 2 minuts C++). U can check and write result u got.
C++

#include <thread>
#include <functional>
#include <chrono>
#include <iostream>
#include <vector>
#include <Windows.h>

DWORD WINAPI MyThreadFunction(LPVOID lpParam)
{
    int num = *(int*)lpParam;
    printf("%d\n", num);
    return 0;
}

int main()
{
    std::vector<HANDLE> threads;

    auto start = std::chrono::high_resolution_clock::now();

    HANDLE hThread;

    // Wait for th

    for (uint32_t i = 0; i < 100000; i++)
    {
        DWORD dwThreadId;
        int ni = i;
        hThread = CreateThread(NULL, 0, MyThreadFunction, &ni, 0, &dwThreadId);
    }

    for (HANDLE handle : threads)
    {
        WaitForSingleObject(hThread, INFINITE);
        CloseHandle(handle);
    }

    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
    std::cout << "Time took: " << duration.count() << " microseconds" << std::endl;

}

Rust

use std::thread;
use std::time::Instant;

fn main() 
{
    let start = Instant::now();
    let mut threads: Vec<thread::JoinHandle<()>> = Vec::new();

    for i in 0..100_000
    {
        threads.push( std::thread::spawn(move ||{
            println!("thread: {}", i);
        }));
    }

    for thread in threads
    {
        thread.join().unwrap();
    }
    let end = Instant::now();
    println!("time: {:?}", end.duration_since(start));
    println!("success");
}

kornel · May 28, 2023, 4:37pm

You're most likely measuring speed of locks around stdout used by printf.

caratovchanin · May 28, 2023, 4:42pm

Im use std::cout, puts, putc and have like result

kornel · May 28, 2023, 4:43pm

Don't print anything to eliminate this variable.

Use a profiler to see where the code spends time.

caratovchanin · May 28, 2023, 4:46pm

Not bad, how u do this in Rust and how i create action in thread?

SNCPlay42 · May 28, 2023, 4:52pm

The C++ code here looks a bit off to me - you never push anything to threads.

caratovchanin · May 28, 2023, 4:54pm

I push pointer int ni in to new thread

SNCPlay42 · May 28, 2023, 4:59pm

I mean you declare std::vector<HANDLE> threads; and later you loop over it - for (HANDLE handle : threads) - but you never put anything in it.

And that loop waits on hThread instead of handle, which means every time (if it runs at all) it's waiting on the last thread you created in the loop above instead of each different thread you created.

caratovchanin · May 28, 2023, 5:02pm

Ok, but its doesnt make sense, Rust allready end loop while C++ creating threads.

simonbuchan · May 29, 2023, 3:55am

Here's the windows part of the implementation of std::thread: it doesn't do much more than CreateThread and install a stack overflow handler:

github.com

rust-lang/rust/blob/master/library/std/src/sys/windows/thread.rs

use crate::ffi::CStr;
use crate::io;
use crate::num::NonZeroUsize;
use crate::os::windows::io::AsRawHandle;
use crate::os::windows::io::HandleOrNull;
use crate::ptr;
use crate::sys::c;
use crate::sys::handle::Handle;
use crate::sys::stack_overflow;
use crate::sys_common::FromInner;
use crate::time::Duration;

use libc::c_void;

use super::to_u16s;

pub const DEFAULT_MIN_STACK_SIZE: usize = 2 * 1024 * 1024;

pub struct Thread {
    handle: Handle,

This file has been truncated. show original

If you're seeing any performance difference from direct CreateThread calls in C++, you're probably measuring something unrelated, as mentioned.

That said, I'm surprised that printf() would be noticably slower than println!()... Perhaps it has some first time per thread init on msvc?

burjui · June 1, 2023, 10:04pm

That makes perfect sense, because printf parses the format string at runtime, while println does the same at compile time, and the resulting code can be further optimized.

simonbuchan · June 1, 2023, 10:49pm

That's likely (but not definitely!) measurably slower, but not noticably (eg to a human) slower: and certainly not 30s to 120s 4× difference!

carey · June 2, 2023, 12:52am

printf() also has the overhead of converting from whatever the current character set is into UTF-16 so that it can call WriteConsoleW, while Rust knows it's starting from UTF-8. I still wouldn't expect that to make that much difference, but it would be interesting to see whether wprintf() or std::wcout was faster.

Alternatively, maybe Windows printf() doesn't do line-buffering?

Donce · June 3, 2023, 6:30pm

It is sometimes fun to play with benchmarking, but most of time you should only benchmark different variations of your code. For example how fast is your code if you use arrays and how fast it is if you use vectors. On other things benchmarks are often useless.
Especially if you compare different programming languages. Because programming language don't have speed. Speed come from hardware. How many operations processor can execute in some time and how fast data is transferred between RAM and processor or HDD/SDD.
Your code or compiler of programming language determine how many operations here are to execute your code. And from this come difference in time it takes to execute two different programs. But this most of time come from how well programmer knows programming language and algorithms used in code.
For example in your code you not only benchmarking how fast threads are created, but also how fast printf prints to terminal compared to println! and how fast is std::vector and Vec::new().
You also should use appropriate compile settings to optimize compiled code. Here may be big differences in speed for debug version and release version just because in debug version std::vector may take more operations compared to Vec::new(), but it may take less operations on release version compared to Vec::new(). Here may be other settings too.
Also, if one thing is faster on one language, other thing may be faster on opposite language.
So, understanding programming language, compiler and settings of compiler to know end result is very important. If you don't know them all benchmarks become just meaningless fun things.

system · September 1, 2023, 6:31pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Why my Rust multithreaded solution is slow as compared to the same c++ solution? code review	7	807	September 15, 2023
Rust threading guidelines help	5	1312	May 8, 2021
M:N Threading, C, Rust and Java? help	7	2050	January 12, 2023
Why is C++ still beating Rust at performance in some places? community	6	4940	September 20, 2023
Rust-specific code optimisations vs other languages help	11	1177	January 3, 2021

Why Rust std::thread ::spawn faster then std::thread in c++?

Related Topics