Hi everyone, I'm currently writing a piece of software using rust. This software calls asciidoctor (a ruby script) to generate html documentation. Use asciidoctor to generate html every time a file is found. Now because there are so many files, the generation is very slow. Is there a better way?
My current thoughts are only two:
Use magnus to embed ruby into rust, then I call the ruby method.
After reading all the files, generate a ruby script that performs the generation tasks.
The first method will cause my program to add many dependencies, and I don't know much about the distribution of ruby packages(that is, I don't know where to place asciidoctor. ).
The second approach would result in a massive rewrite of my program structure.
Is there a better way to do this?
Of course, rewriting asciidoctor in rust is also a way.
Note that the kind of program structure needed for
that parallelism is similar to what you would need for the "after reading all the files, generate a ruby script" option: both imply that preparing inputs doesn't wait for previous outputs.
That suggests that you should begin that "massive rewrite" regardless of which solution you pick. It may not be as hard as you think; a pipeline/data-flow approach can often be very straightforward and clean.
Thanks, I'll think about it carefully. As you said, after thinking about it, maybe I don't need to give a lot more, and I will further evaluate the advantages and disadvantages of this approach.
If spawning as many processes as physically worth it in parallel is still not fast enough (ie., you are hitting the limits of your machine), then there isn't much you can do except for buying more silicon.
That is, unless there is some overhead that could be amortized over the several files. Did you measure specifically what takes the most time in running that script? If it's the pure computation of transforming the input language to HTML, then you are out of luck. If it's importing all dependencies and starting up the Ruby VM, then it might be solvable by calling the API from Ruby instead of shelling out to run separate scripts one-by-one.
I haven't carefully evaluated its timing, it's just a matter of intuition. Because this script will be called dozens of times, it is likely that the ruby vm startup time will occupy a large proportion. I didn't evaluate it in detail at the time.