Rayon parallelism on the lines of a text file


#1

To parallelize the first Rust solution here, I’d like to use Rayon to read the lines of the file lazily and convert them in parallel to Vec<f64>, do you know how to perform such map-reduce on the lines of the text file with Rayon?


#2

It’s a common request, e.g. rayon#46 and rayon#297. We don’t have a good answer for that yet.

If you can tolerate reading or mapping the whole file into memory (their big-data.csv is 315MB), then you could use par_lines to do the parsing.


#3

Thank you. It’s indeed probably a common need. I’ll try to map the file then.


#4

Not really rust related, but if you intend to read the whole file in memory, under linux sequentially read()ing in a loop is faster than mmap()ing (because of predictable readahead and all that).


#5

Not necessarily.

(Blanket claims about the performance of memory maps are almost certainly not the whole story. :-))


#6

Agreed. In all cases: if perf is important, benchmark.


#7

In general I can’t read the whole file in memory, because it’s large, and even if I do it in this case, the performance of Rayon par_lines on the memory-mapped file is higher up to 16 hardware not-hyperthreading threads.

Now I’m thinking how to to use Rayon on a solution that’s less functional and faster… :slight_smile:


#8

For really huge files, you could also try batching it. Read or map a large chunk into memory, process with par_lines, repeat – but you’ll have to deal with lines that straddle chunks. Reading is probably easier than mapping in that respect, as you can just retain the incomplete-line suffix as a prefix to the next read.