Reading a file 4x faster using 4 threads (Works - threaded is faster!)

Unfortunately I don't know enough about threads in Rust yet to answer that; I'd have to look at the API docs if threads support something like interrupting, as they do in the JVM.

If you cannot or do not want to interrupt threads (which is kind of like asking them to terminate), you could introduce some sort of semaphore flag that each thread checks every so often; you set the flag to true once a match is found and all other threads then terminate without answer upon checking the flag.
However, that introduces inter-thread-communication, which means locking to do it in a secure way, and that can be a huge bottleneck for performance. So that is not desirable.

Question however, is do you really need only the first match? In a real world case, wouldn't you rather want all matches? Or is that something which you plan to implement as an option for the user?

Regarding the word matching algorithm, do you want to support partial word matches too actually, or only full word matches?

Quick-and-dirty solution would be to terminate the whole program, by blocking the main thread on channel signaled by worker threads. Like this:

use std::sync::mpsc::{sync_channel, SyncSender};
use std::thread::{spawn, sleep};
use std::time::Duration;

fn do_work(thread_num: u32, msecs: u32, stop: SyncSender<()>) {
    let mut elapsed = 0;
    loop {
        // Imitating search by sleeping for `msecs` milliseconds
        sleep(Duration::from_millis(1));
        elapsed += 1;
        println!("Thread {}, elapsed {}", thread_num, elapsed);
        if elapsed >= msecs {
            stop.send(()).unwrap();
        }
    }
}

fn main() {
    let (tx, rx) = sync_channel::<()>(1);
    for i in 1..=4 {
        let sender = tx.clone();
        let secs = 5 + i;
        spawn(move || do_work(i, secs, sender));
    }
    rx.recv().unwrap();
}

Playground

1 Like

FWIW, on recent NVMe SSDs, it can actually make a difference.
Example:
for reference, to show the overhead is not system calls:

$ dd if=/dev/zero of=/dev/null status=progress bs=4k count=1M
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 0.512628 s, 8.4 GB/s

reading a large file, unbuffered, with small block size, with two commands, in sequence:

$ time (dd if=pack-bbbf522788ea6c2c099a72c4f8e03da44d76a4c3.pack of=/dev/null iflag=direct bs=4k count=339524 ; dd if=pack-bbbf522788ea6c2c099a72c4f8e03da44d76a4c3.pack of=/dev/null iflag=direct bs=4k skip=339524 count=339524)
339524+0 records in
339524+0 records out
1390690304 bytes (1.4 GB, 1.3 GiB) copied, 9.38634 s, 148 MB/s
339524+0 records in
339524+0 records out
1390690304 bytes (1.4 GB, 1.3 GiB) copied, 9.73337 s, 143 MB/s

real	0m19.122s
user	0m0.054s
sys	0m4.570s

same, but in parallel:

$ time (dd if=pack-bbbf522788ea6c2c099a72c4f8e03da44d76a4c3.pack of=/dev/null iflag=direct bs=4k count=339524 & dd if=pack-bbbf522788ea6c2c099a72c4f8e03da44d76a4c3.pack of=/dev/null iflag=direct bs=4k skip=339524 count=339524 & wait)
339524+0 records in
339524+0 records out
1390690304 bytes (1.4 GB, 1.3 GiB) copied, 9.15265 s, 152 MB/s
339524+0 records in
339524+0 records out
1390690304 bytes (1.4 GB, 1.3 GiB) copied, 9.28513 s, 150 MB/s

real	0m9.287s
user	0m0.099s
sys	0m4.595s

Each individual dd has the same speed as when doing things sequentially, but overall, we're now twice as fast for using two threads/processes.

And in case you'd think the small block size is changing things, with larger blocks:

 time (dd if=pack-bbbf522788ea6c2c099a72c4f8e03da44d76a4c3.pack of=/dev/null iflag=direct bs=1M count=1326; dd if=pack-bbbf522788ea6c2c099a72c4f8e03da44d76a4c3.pack of=/dev/null iflag=direct bs=1M skip=1326 count=1326)
1326+0 records in
1326+0 records out
1390411776 bytes (1.4 GB, 1.3 GiB) copied, 1.64749 s, 844 MB/s
1326+0 records in
1326+0 records out
1390411776 bytes (1.4 GB, 1.3 GiB) copied, 1.51089 s, 920 MB/s

real	0m3.161s
user	0m0.003s
sys	0m0.194s
$ time (dd if=pack-bbbf522788ea6c2c099a72c4f8e03da44d76a4c3.pack of=/dev/null iflag=direct bs=1M count=1326& dd if=pack-bbbf522788ea6c2c099a72c4f8e03da44d76a4c3.pack of=/dev/null iflag=direct bs=1M skip=1326 count=1326& wait)
1326+0 records in
1326+0 records out
1390411776 bytes (1.4 GB, 1.3 GiB) copied, 1.45684 s, 954 MB/s
1326+0 records in
1326+0 records out
1390411776 bytes (1.4 GB, 1.3 GiB) copied, 1.50515 s, 924 MB/s

real	0m1.507s
user	0m0.001s
sys	0m0.210s
2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.