Multiprocess parallelism


#1

There is a very heavy C API I need to call on a lot of inputs, and I’d like to do it in parallel.

Just one thing: Internally, this API uses the non-reentrant strtok function. A lot. And as a result, if I try to call it from multiple threads, it segfaults. A lot. This tosses rayon right out of the window. And I don’t see very many other nice options.

The thing is, I have no idea where to begin.

  • I don’t see any crates for doing this.
  • I wouldn’t know how to create a process whose main is defined in my project. Would I need a separate crate just for this? (Yikes)
  • For IPC I see the serde-based ipc_channel crate… though, um, I feel like there’s the chicken-and-egg problem of how do I send the other end of the channel itself?

Any pointers?

// (these stand in for data usually on the order of
// a couple dozen thousand floating point numbers)
#[derive(Debug, Clone, PartialEq)] pub struct Input;
#[derive(Debug, Clone, PartialEq)] pub struct Output;

// (Consider this code to be a fixed part of the problem.
//  It's standing in for a very heavy C library that I
//  have no hope of replacing.)
mod ffi {
    use super::*;

    // (representative example of code using the C standard library)
    pub fn do_expensive_thing(_i: &Input) -> Output
    {
        let start = ::std::time::Instant::now();
        while start.elapsed().as_secs() < 10 {
            _parse_a_string();
        }
        Output
    }

    // (representative example of the C standard library)
    fn _parse_a_string()
    {
        static mut ONE: u32 = 0;
        unsafe {
            ONE += 1;
            assert_eq!(ONE, 1, "Core dumped.");
            ONE -= 1;
        }
    }
}

// ------------------------------------

extern crate rayon;
use rayon::prelude::*;

// FIXME
// This will segfault, so multithreading is basically out.
// How could I similarly encapsulate the use of worker *processes*?
pub fn do_all_expensive_things(inputs: &[Input]) -> Vec<Output>
{
    inputs.par_iter().map(ffi::do_expensive_thing).collect()
}

#2

You didn’t say what platform you’re using, and whether the program has to be cross-platform. If it’s Unixish, you can use nix and the usual Unix arsenal of pipe, fork et al. It shouldn’t be too difficult to combine it with an async main process driver. Chunking and marshalling input to subprocesses would have to be manual, though; rayon can’t help here.


#3

Yep, it’s unix. I totally forgot about fork. Since fork also duplicates the address space, it seems this will take care of a large amount of the IPC as well (by virtue of simply not needing it)

I am a little surprised to see that it is a safe function. It seems to open up new possibilities which simply would not otherwise be possible in safe code (but maybe I am wrong), and I imagine that there is probably unsafe code out there whose correctness is predicated on the lack of it existing.


#4

fork is especially dangerous for multithreaded programs. If some other thread is in the middle of a transaction, perhaps holding a lock, then that will never be completed in the forked process. As far as libc is concerned, you should limit yourself to async-signal-safe functions, which notably does not include malloc!


#5

You could try out timely dataflow. It is meant more for data-parallel computation, but it works just fine with multi-process single-machine execution as well, and handles IPC as long as you are sending types that implement abomonation.

To give a sense for how this looks, you write a program that creates work items that you want to process as a data stream, use the exchange method to distribute the work items, and then FFI into your C code on each worker. Then you spin up multiple copies of each of these programs on the same machine (or multiple machines, if you have them) with enough information for them to rendezvous with each other (local machine: just the number of processes and they’ll use localhost, multiple machines: need to supply a hosts.txt with hosts and ports). The doing more things example in the timely readme.md should be pretty close to what you want, I think.


#6

As @cuviper said, fork is a minefield with multithreaded programs; in my experience, the best remedy is to make sure that the program is single-threaded, and use only classic Unix multiprocess and IPC facilities.

In this case, the largest stumbling block is child process handling, since Rust has no single-threaded async signal handling facilities (that I know of), and you need to catch SIGCHLD and wait() to reap exited children. You can sidestep the problem by explicitly reparenting each forked process to PID 1, or tolerating zombie processes until the parent exits.


#7

As @cuviper said, fork is a minefield with multithreaded programs; […]

Yikes; maybe I should reserve all usage of fork for the very beginning of main to guarantee no other threads exist. The alternative of guaranteeing that the program is single-threaded seems far too difficult.

Looks like this might be a lot safer, and probably even faster since it relies on Abomonation. However, it looks like there’s a number of concepts I would need to learn, and it may also change how my program needs to be run.


Unfortunately, it looks like either way I do it, this is probably too big of a yak for me to shave right now. I think I’ll need to come back to this later when I have more liberty to worry about performance.