How to optimize multi-process performance

Hi everyone, I'm currently writing a piece of software using rust. This software calls asciidoctor (a ruby ​​script) to generate html documentation. Use asciidoctor to generate html every time a file is found. Now because there are so many files, the generation is very slow. Is there a better way?

My current thoughts are only two:

  1. Use magnus to embed ruby ​​into rust, then I call the ruby ​​method.
  2. After reading all the files, generate a ruby ​​script that performs the generation tasks.

The first method will cause my program to add many dependencies, and I don't know much about the distribution of ruby ​​packages(that is, I don't know where to place asciidoctor. :frowning: ).
The second approach would result in a massive rewrite of my program structure.

Is there a better way to do this?

Of course, rewriting asciidoctor in rust is also a way.

1 Like

If you are shelling out to an external script, you can start many processes in parallel (potentially as many as there are physical cores in your CPU).

1 Like

Note that the kind of program structure needed for
that parallelism is similar to what you would need for the "after reading all the files, generate a ruby script" option: both imply that preparing inputs doesn't wait for previous outputs.

That suggests that you should begin that "massive rewrite" regardless of which solution you pick. It may not be as hard as you think; a pipeline/data-flow approach can often be very straightforward and clean.

Yes, I now use tokio for parallel operations, but it's still very long. So I wanted to find something better.

Thanks, I'll think about it carefully. As you said, after thinking about it, maybe I don't need to give a lot more, and I will further evaluate the advantages and disadvantages of this approach.

If spawning as many processes as physically worth it in parallel is still not fast enough (ie., you are hitting the limits of your machine), then there isn't much you can do except for buying more silicon.

That is, unless there is some overhead that could be amortized over the several files. Did you measure specifically what takes the most time in running that script? If it's the pure computation of transforming the input language to HTML, then you are out of luck. If it's importing all dependencies and starting up the Ruby VM, then it might be solvable by calling the API from Ruby instead of shelling out to run separate scripts one-by-one.

I haven't carefully evaluated its timing, it's just a matter of intuition. Because this script will be called dozens of times, it is likely that the ruby ​​vm startup time will occupy a large proportion. I didn't evaluate it in detail at the time.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.