To parallelize the first Rust solution here, I'd like to use Rayon to read the lines of the file lazily and convert them in parallel to Vec<f64>
, do you know how to perform such map-reduce on the lines of the text file with Rayon?
It's a common request, e.g. rayon#46 and rayon#297. We don't have a good answer for that yet.
If you can tolerate reading or mapping the whole file into memory (their big-data.csv
is 315MB), then you could use par_lines
to do the parsing.
Thank you. It's indeed probably a common need. I'll try to map the file then.
Not really rust related, but if you intend to read the whole file in memory, under linux sequentially read()ing in a loop is faster than mmap()ing (because of predictable readahead and all that).
(Blanket claims about the performance of memory maps are almost certainly not the whole story. :-))
Agreed. In all cases: if perf is important, benchmark.
In general I can't read the whole file in memory, because it's large, and even if I do it in this case, the performance of Rayon par_lines on the memory-mapped file is higher up to 16 hardware not-hyperthreading threads.
Now I'm thinking how to to use Rayon on a solution that's less functional and faster...
For really huge files, you could also try batching it. Read or map a large chunk into memory, process with par_lines
, repeat -- but you'll have to deal with lines that straddle chunks. Reading is probably easier than mapping in that respect, as you can just retain the incomplete-line suffix as a prefix to the next read.