Borrow checker and Rayon parallel iteration

I'm having a little bit of trouble with the borrow checker trying to make a nested for loop parallel without extra allocations. Here's (essentially) the serial code I'm trying to parallelize:

fn merge(first: Vec<String>, second: Vec<String>) -> Vec<String> {
    let mut out = Vec::new();
    for f in &first {
        for s in &second {
            if should_merge(f, s) {
                out.push(f.to_owned() + s);
            }
        }
    }
    out
}

should_merge() is relatively expensive, so I want to run that bit of code in parallel. Here's the code I want to write:

fn par_merge(first: Vec<String>, second: Vec<String>) -> Vec<String> {
    first
        .par_iter()
        .map(|f| second.par_iter().map(|s| (f, s)))
        .flatten()
        .filter(|(f, s)| should_merge(f, s))
        .map(|(f, s)| f.to_owned() + s)
        .collect()
}

The borrow checker observes that the closure could outlast the function call, and it still has a borrow into f. I haven't figured out how (if?) I can convince the borrow checker that I'm going to collect all of the references before the end of the function call. Here's my current workaround:

fn par_merge_workaround(first: Vec<String>, second: Vec<String>) -> Vec<String> {
    first
        .par_iter()
        .map(|f| {
            second
                .par_iter()
                .filter(|s| should_merge(f, s))
                .map(|s| f.to_owned() + s)
                .collect()
        })
        .reduce(
            Vec::new,
            |mut l, r| {
                l.extend(r);
                l
            },
        )
}

Even the workaround version had much lower latency than the serial version, but it's doing a bunch of unnecessary allocations that I can't figure out how to avoid in safe Rust. This is an inner loop in our system which might justify unsafe, but I'd prefer not to go down that road if at all possible.

EDIT: With @naim's help, all I needed was to make my inner closure a move closure:

fn par_merge(first: Vec<String>, second: Vec<String>) -> Vec<String> {
   first
       .par_iter()
       .map(|f| second.par_iter().map(move |s| (f, s)))
       .flatten()
       .filter(|(f, s)| should_merge(f, s))
       .map(|(f, s)| f.to_owned() + s)
       .collect()
}
1 Like

Instead of moving strings around, try working with their references:

fn par_merge(first: Vec<String>, second: Vec<String>) -> Vec<String> {
    first
        .par_iter()
        .map(|f| f.as_str())
        .map(|f: &str| second.par_iter().map(|s| s.as_str()).map(move |s: &str| (f, s)))
        .flatten()
        .filter(|(f, s)| should_merge(f, s))
        .map(|(f, s): (&str, &str)| f.to_owned() + s)
        .collect()
}

In this snippet, we map to the String's lifetime by using as_str

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.