Dynamically using (or not using) rayon

ExpHP · June 26, 2018, 8:23pm

So here's a dumb thing I just wrote:

use rayon::prelude::*;

impl GammaUnfolder {
    pub fn from_config(
        config: &Config,
        use_rayon: bool,
        superstructure: &Coords,
        sc_matrix: &ScMatrix,
        // eigenvector_q: &V3, // reduced by sc lattice
    ) -> GammaUnfolder
    {
        ...
        let func = |&quotient_q: &V3| {
            q_ket(
                &superstructure.to_carts(),
                &(eigenvector_q_cart + sample_q + quotient_q),
            )
        };
        let q_kets_by_sc = if use_rayon {
            quotient_vecs.par_iter().map(func).collect()
        } else {
            quotient_vecs.iter().map(func).collect()
        }
        ...
    }
}

Somewhere down the line in the same file I have a similar thing doing

    if use_rayon {
        a_iter.into_par_iter()
            .zip(b_iter)
            .for_each(|(a, b)| callback(a, b))
    } else {
        a_iter.into_iter()
            .zip(b_iter)
            .for_each(|(a, b)| callback(a, b))
    }

If these iterator chains got any larger than I'd have to start "factoring" them into macros instead!

Is there anything like a magic

vec.into_par_iter()
    .use_threads(false)

? Does with_min_len(usize::MAX) do the trick, or is that just going to make a bunch of extra spinning threads? (or maybe it does spawn a bunch of extra threads, but the unused ones will play nice and avoid fighting for CPU time?)

My aim with these flags is really just to help control where threads are created. i.e. An expensive function might be called from several places, some of which already have threads (or e.g. calls some C function that uses OpenMP); in those places, I think should probably avoid creating another layer of threads, lest I end up with an exponential number of threads all fighting each other.

cuviper · June 26, 2018, 8:47pm

Hmm, that's an interesting dilemma.

This will mostly work, and the unused threads should go completely idle. But for exceptions, Chain will unconditionally join its two sides in parallel, and for FlatMap you'd have to apply with_min_len on the mapped iterator too.

The surest way to serialize it is to create a private single-thread "pool". This will also have the effect of serializing any other rayon access within your iterator, for example calling rayon::join directly or nested parallel iterators. Something like:

fn maybe_parallel<T, F>(use_threads: bool, f: F) -> T
    where T: Send,
        F: FnOnce() -> T + Send,
{
    if use_threads {
        // running directly will let it use the current thread pool
        f()
    } else {
        // run it in a single-threaded "pool"
        let pool = rayon::ThreadPoolBuilder::new().num_threads(1).build().unwrap();
        pool.install(f)
    }
}

Or if you don't mind having a global effect on the program, you could just call this once:

rayon::ThreadPoolBuilder::new().num_threads(1).build_global().unwrap();

ruabmbua · June 26, 2018, 8:47pm

I thought rayon will just create a number of threads depending on your cpu, and then execute its stuff in a work stealing fashion using threads from the global pool? So deciding between non parallel and parallel iterators should not be required.

cuviper · June 26, 2018, 8:53pm

Oh, for this, Rayon's global thread pool will only be created once for the life of the process, the first time it's used. So there aren't really any thread "layers" as far as that goes. And if you're already in a Rayon threadpool, global or not, then all Rayon calls will continue using that, which is how the single-threaded hack I gave would work.

This doesn't help in competing with other threadpools like OpenMP though.

ExpHP · June 26, 2018, 10:53pm

Ah, of course! I vaguely remembered having read about something like that before, but when it comes to my memory, my preconceptions seem to have a tendency to win out in the long run.

That is nice because, in general, most opportunities for parallelism in my code are of the "just toss in rayon" sort. And for instance, I can be certain (at least for now) that the particular example I posted above does not have to compete with OpenMP, because the control flow is "self-contained" (in that it takes no callbacks, and does the whole computation strictly to completion), and because I know my program only ever calls the C library from single-threaded contexts. (the library is not threadsafe!)

So it would seem that technically, in this circumstance, I could probably just write par_iter, forget about configurability, and call it a day.

...

...but I don't know. Something feels unclean about having parallelism hidden behind a function call. I feel like there's something valuable about being able to explicitly see a "threading" argument in high-level code like

let splines = reticulate_splines(
    &frogs,
    &knobs,
    Threading::Parallel,
)?;

But if explicitness is all I want in such cases, then.... I think I could probably just pass around a threadpool. Which makes this trick:

The surest way to serialize it is to create a private single-thread “pool”. [...]

awesomely convenient.

(or on second thought, combining the solutions might result to situations where I am both installing a threadpool globally and passing it down to other functions for them to install, which is weird. So I think I'll drop the "pass around a threadpool" bit and just use your initial suggestion in earnest)

cuviper · June 27, 2018, 1:47am

I'm still thinking if there's any way for Rayon to provide an adapter like this for arbitrary ParallelIterator.

In the meantime, here's an experimental wrapper that lets you be conditionally serial or parallel up front, then provides an API of some methods that are common between Iterator and ParallelIterator. It feels like this might be overkill, and I don't even know if it works, but it at least builds...

ExpHP · June 27, 2018, 2:15am

That's pretty clever! I didn't think it would be possible to reduce it down to O(1) overhead for a serial iterator like that. (though in my case, I think the work items are always large enough that I am not too concerned; and besides, rayon has adaptive chunk sizes).

The level of necessary boilerplate is also surprisingly small for an abstraction over different breeds of iterators!

HaronK · October 23, 2018, 1:22pm

Really cool wrapper! Exactly what I need. Do you plan to publish it on crates.io?

cuviper · October 23, 2018, 9:11pm

Thanks for reminding me. I was trying to recall this little experiment to answer the request in rayon#606, but I had forgotten where I put this.

I wasn't specifically planning to publish rayon-cond, but if there's real interest, I certainly could!

cuviper · February 22, 2019, 9:43pm

I finally got around to publishing rayon-cond!

ExpHP · February 22, 2019, 9:55pm

And added to my dependencies! It feels good to be able to kill 700 lines from my code base.

Topic		Replies	Views
Rayon multiple par_iter one after the other help	3	546	February 12, 2021
Calling Rayon Users: Opinions Wanted announcements	13	4581	January 12, 2023
[Solved] Rayon only running 24 threads; CPU only at 5%	8	1834	January 12, 2023
What is the best way to allocate one task per cpu? help	10	1763	January 12, 2023
How to create parallel sequential iterators? help	5	2141	January 12, 2023

Dynamically using (or not using) rayon

Related topics