Help for dealing with multi process

freeze-dolphin · March 20, 2022, 6:10am

Background

I am a rust starter, and I got a little stuck while planning my first project, which can be used for word frequency analyzing in a text file.

The basic feature has been implemented last week ( Update README.md · freeze-dolphin/rfreq@1680e60 · GitHub )
But there is one problem: it takes too much time while performing on a big file.

So I decided to use multi process, but I am really confused about it.

Problem

You can check the source code from the link below:
(note that this commit contains error)

My idea is to read the whole file first, and then separate the whole content into several parts and distribute them into threads. The threads will perform the analysis separately and after all the work are done, the result will be collected into one HashMap and be printed out in some format.

I may not active on this forum, so a PR on my repo will bring me much convenience!

H2CO3 · March 20, 2022, 6:18am

You will need to define what "too much time" or a "large file" means precisely. I have a hunch that reading such large files into memory will not be sustainable (or perhaps even possible). You may want to consider memory mapping in this case.

It also matters how you are running the program. Are you building in release mode? Can you assume reading directly from a file or do you need to support piping (which prevents mmapping)?

freeze-dolphin · March 20, 2022, 7:47am

Um... I think I did ask a stupid question.
It really takes a little time to analyze the whole Harry Potter in release mode, which will take at least 1 min in debug mode.

I don't think analyzing oversized files (about 1TB) will be in my recent plan, as I just started my 'journey of rust'.

system · June 18, 2022, 7:48am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Mapping large files help	4	746	January 12, 2023
Dealing with large text files	10	1280	January 15, 2023
Reading and writing file : speed problems help	7	3098	February 9, 2020
How to optimize multi-process performance	7	421	September 13, 2023
Code organization for large monolith projects help	2	491	November 25, 2021

Help for dealing with multi process

Background

Problem

Related Topics