Rayon, Rust - C bindings and involuntary context switches

I am calling into c bindings of mupdf c library and trying to parallelize using Rayon, the tasks are not small and there are only 3 tasks

Mutithreaded

Time elapsed in expensive_function() is: 7.649747625s
Command being timed: "./extract"
User time (seconds): 7.60
System time (seconds): 49.60
Percent of CPU this job got: 743%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.69
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 867184
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 154
Minor (reclaiming a frame) page faults: 92168
Voluntary context switches: 0
Involuntary context switches: 1097327
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 16384
Exit status: 0

Single threaded

Time elapsed in expensive_function() is: 3.570009125s
Command being timed: "./extract"
User time (seconds): 3.37
System time (seconds): 0.17
Percent of CPU this job got: 85%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.13
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1052688
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 10
Minor (reclaiming a frame) page faults: 89097
Voluntary context switches: 3
Involuntary context switches: 807
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 16384
Exit status: 0

You haven't provided anywhere near the required info to help diagnose this. A minimal code example would be best, but as much as possible up to that point would be useful. Without that I can only provide some general suggestions.

First up: paralleisation has overhead, so it isn't always faster. In particular if you have many small tasks you want to batch them and parallelise the batches instead to counteract this.

Then of course locking and other forms of synchronisation also has overhead. So if the tasks aren't independent but need to share resources that can make things much slower. Ideally you want to start the tasks, and let them run independently until the end when they report back whatever it is. It is possible you don't have a lock (or shared atomic variable or whatever it might be) but one of your dependencies do.

There are probably other reasons too, but those are the two main ones that come to mind.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.