I thought this would be an easy case, and indeed I got my loop converted to par_iter pretty easily. However, it does not seem to do a lot really... so here's the loop:
It compiles and runs. However, while I do see rayon spawn as many threads as I have CPUs, only ever one loop body is executed at any time (the eprintln gives strong hints, as the function called runs several seconds for each). It's kinda easy to see this because render results in an external command (pdflatex) and one can see there's never more than one external process running.
So, how do I diagnose this, what could be the error? The full code is a bit involved and closed, but since the compiler accepts this, I don't see what could keep things from being parallelized. I do not have any synchronization in place (this was serial code after all, I did not rework a lot).
Some speculation on my side, which feels wrong, but I don't know how to investigate:
Each thread creates its own tempfile::tempdir to work in. Does that imply some sort of block?
Each thread copies some files into the tmpdir to the main directory via fs_extra::copy_items before doing work. Afterwards, it copies out some results again into the main directory. Could consecutive calls of copy_items somehow lock a directory or something?
data is a hashbrown::Hashmap with the rayon, serde features. I contains shared references into two other such HashMaps that live a much wider scope. Seeing this was made specifically for use with rayon's ParallelIterator I don't see how that would hurt, but I didn't really find a lot of mention in the hashbrown docs about pitfalls here.
I'd be glad for any pointers I can of course show some more type definitions or functions, but this is long enough as-is, so maybe someone just has an idea to throw out. Thanks for readin, anyways
AFAIK, rayon is more meant for CPU-bound tasks; for parallelizing IO-stuff, using async could be another option. Nonetheless, your parallel iterator should work in principle.
Looking at it’s documentation, I don’t feel like tempdir is a problem.
Skimming through the source of copy_items I don’t see a problem either.
Maybe you could debug this, e.g. by using a global variable to single out the first invocation of render and make its implementation block indefinitely at some place. If you find a place where such an indefinitely blocking operation manages to stop the whole program (including the other threads), then you
know that there really is a problem, and you
can move around the place where the block is introduced to single out the operation that is responsible
use std::sync::atomic::{AtomicBool, Ordering};
use std::thread;
fn block_on_first_call() {
static FIRST_CALL: AtomicBool = AtomicBool::new(true);
let first_call = FIRST_CALL.swap(false, Ordering::SeqCst);
if first_call {
loop {
thread::park()
}
}
}
then insert a call to block_on_first_call() somewhere in your code and see if it locks the whole threadpool; preferrably in the of the part of the render operation that takes the longest time.
Very good idea. But the result is kinda what I expected: The closure body of the map above is called sequentially (i.e. when I put a call to block_on_first_call there instead of the eprintln!, it just immediately locks up).
But this all happens with the eprintln, it was just a convenient way to show the sequentiality (is that a word?).
Well, that did it. I replaced hashbrown::HashMap by Vec as the type for map, and now everything's fine! Runtime of my example test went from 18s to 7s, which is about what I'd expect in this case.
Not sure why it's the case though, I looked through some comments in hashbrown, but there was no mention of this behavior.
fn main() {
let m = <HashSet<_>>::from_iter(0..15);
let _: Vec<_> = m.par_iter().map(|i| {
block_on_first_call();
println!("{}", i);
}).collect();
}
uses at least 2 threads in parallel, whereas
fn main() {
let m = <HashSet<_>>::from_iter(0..14);
let _: Vec<_> = m.par_iter().map(|i| {
block_on_first_call();
println!("{}", i);
}).collect();
}
so a hash map with at most 14 elements doesn’t use more than one thread. Perhaps the ParallelIterator implementation of HashMap/HashSet somehow sets a lower bound on how small the parts are that the map can be split into?
Skimming through the code, I arrive at this part: mod.rs - source
which does indeed suggest a presence of a minimal size. Probably has some valid reasons behind it.
Anyways, as I mentioned, rayon is probably the wrong tool. If you want to try, consider using async fns, e.g. with tokio. You could use StreamExt::buffered to determine up to how many operations you want to do in parallel.
Hmm, are you sure? I mean it's pretty light on the IO, what's really eating time is the eternal process (mostly CPU bound, though I don't think that matters). I mean I guess async works as well since the OS will schedule the processes appropriately...
Anyways, seeing I've got it working thanks to you, and I really don't need to optimize the rust code itself very much, I don't think I want to spend time converting to async here. Needs other work, too, and I'm on a deadline
If you want more granular parallelism for smaller maps and sets, you could collect item references to a Vec first. That's what rayon does anyway for the std types, since it can't do splits on the internals -- that's painful for large lengths but probably fine in small cases.