Help for dealing with multi process


I am a rust starter, and I got a little stuck while planning my first project, which can be used for word frequency analyzing in a text file.

The basic feature has been implemented last week ( Update · freeze-dolphin/rfreq@1680e60 · GitHub )
But there is one problem: it takes too much time while performing on a big file.

So I decided to use multi process, but I am really confused about it.


You can check the source code from the link below:
(note that this commit contains error)

My idea is to read the whole file first, and then separate the whole content into several parts and distribute them into threads. The threads will perform the analysis separately and after all the work are done, the result will be collected into one HashMap and be printed out in some format.

I may not active on this forum, so a PR on my repo will bring me much convenience!

You will need to define what "too much time" or a "large file" means precisely. I have a hunch that reading such large files into memory will not be sustainable (or perhaps even possible). You may want to consider memory mapping in this case.

It also matters how you are running the program. Are you building in release mode? Can you assume reading directly from a file or do you need to support piping (which prevents mmapping)?

1 Like

Um... I think I did ask a stupid question.
It really takes a little time to analyze the whole Harry Potter in release mode, which will take at least 1 min in debug mode.

I don't think analyzing oversized files (about 1TB) will be in my recent plan, as I just started my 'journey of rust'.