Maybe there is a good blog post or some tutorial somewhere?
But I suspect this is the cause behind some very low CPU utilization on multithreaded code I have, which performs blazingly fast on small data sizes, but once the data is even a few MB in length, htop begins to look poor, with rare bursts of mutithreaded CPU activity scattered in between long periods of activity that looks like this:
My whole computer comes to mostly a standstill also when this is the case.
I think it's because (in this case) I have a 30MB PDF that i am processing in rayon but split over the pages (there are some 1.3k pages), and i think, if i understand correctly thread_local! keeps a copy in every thread, at the cost of greater memory footprint, but it means the algorithms can get to work faster as rayon doesn't have to bus/shuffle around this 30MB everywhere? Or maybe there is contention on it, or something?
But I'm kind of guessing this. I don't know how or when to use it.
Could any of you wise folk enlighten me? Thanks!!!
No thanks really just an explanation of why thread_local exists, and when to use it, maybe does it have analogues in other programming languages? Is it essentially just a wrapper over an OS API, or is a rusty trick to overcome the nature of how multiple CPU cores and shared caches work, etc? I'm completely guessing, and this is why I just don't use it.
It's similar to BufRead, i just don't have a well and low enough level understanding of what is going on, so I may be missing out on obvious tricks that will make my code so much faster.
With you're help, i'll then deduce if I can use it to solve my current "problem"
Thread-locals have nothing to do with BufRead. They are two different concepts and solve completely different problems.
The problem that thread-locals solve is that sometimes, you want some sort of global state which is not thread-safe. In this case, it can't be truly global, because globals are accessible from all threads. If putting the state behind a mutex and locking it upon every access would be too tedious, but duplicating it for each thread is OK, then a thread-local can be used for creating and caching a separate, independent instance for each thread. It can then be used freely, as if it were a global, being assured that only one thread will ever access it.
I don't know how all of this relates to your specific problem, though.
I think it's here where I'm getting the htop problem.
I was wondering if maybe pdf: &[u8] should be thread_local or maybe just owned and cloned and moved into rayon, so that each thread is not jumping around to read the slice. But maybe it's not necessary?
Pinging @cuviper (- really hope it's not out of line to ping!) the god of multithreading.
It's similar to BufRead
I meant my lack of understanding and therefore missing potential obvious optimizations is similarly expressed in the absence of any BufReads in my code. I wonder how many other mistakes I am making.
Edit: you know what, do_bar_on_individual_page does call std::fs::write on the pdf, for every page of the pdf, which is obviously no bueno. and perhaps is the cause of that!
Edit 2: Yup. that was it! sheez. I'm sorry. but i hope @H2CO3's explanation of thread_local will benefit more readers in the future!!