I would like to parallelise a program that can be approximated by the following toy example (playground):
It splits a text into phrases, then splits each phrase into words and outputs for every phrase an increasing number of words.
use rayon::prelude::*;
fn main() {
let mut len = 0;
let i = String::from("To be. Or not to be. That is the question");
let phrases = i.split(".");
let iter = phrases
//.par_bridge()
.map(|p: &str| p.split_whitespace().collect::<Vec<_>>());
let iter = iter.map(|p: Vec<&str>| {
len += 1;
p.into_iter().take(len).collect::<Vec<_>>()
});
iter.for_each(|p| println!("{:?}", p))
}
This works fine and prints:
["To"]
["Or", "not"]
["That", "is", "the"]
Now I would like to parallelise the splitting into words using Rayon.
However, when I uncomment the call to par_bridge
, I get an error that I "cannot assign to len
, as it is a captured variable in a Fn
closure". Fair enough.
My go-to solution in such scenarios is to create a channel and to send things through it (playground):
use std::sync::mpsc::channel;
use std::thread;
fn main() {
let (tx, rx) = channel();
thread::spawn(move || {
let i = String::from("To be. Or not to be. That is the question");
let phrases = i.split(".");
let iter = phrases.map(|p: &str| p.split_whitespace().collect::<Vec<_>>());
iter.for_each(|i: Vec<&str>| tx.send(i).unwrap())
});
}
However, here I get the error that i
does not live long enough. That makes sense: When sending a Vec<&str>
through the channel, nothing ensures here that the &str
references live long enough. (In this toy example, I could circumvent the error by omitting String::from
and thus making i: &'static str
, but in my actual application, I cannot.)
I thought about solving this problem in two ways, both of which have significant downsides:
The first solution would be to collect into Vec<String>
; however, converting every &str
to String
significantly increases runtime (I have a very large number of &str
s in my actual application).
The second solution would be to move every &str
of Vec<&str>
into an arena to ensure that the references live long enough; however, this significantly increases memory consumption.
Do you know of other ways to solve this problem?