Issues using multi-threading when using two wordlists

Hello Rust Forum!

Currently building a fuzzing application and want to be able to implement multi-threading as well as using up to two wordlists, to create a viable fuzzing application.

Unfortunately I consistently run into errors passing values to the thread.execute closure, when I use two wordlists. Because of the recursion and ownership rules in Rust the code results in the following error:

error[E0507]: cannot move out of `word2`, a captured variable in an `Fn` closure
   --> src/utils/fuzzer.rs:170:37
    |
167 |                 let Ok(word2) = line2 else { return };
    |                        ----- captured outer variable
168 |                 let word = word.clone();
169 |                 pool.execute(move || {
    |                              ------- captured by this `Fn` closure
170 |                     self.get(&word, word2);
    |                                     ^^^^^ move occurs because `word2` has type `std::string::String`, which does not implement the `Copy` trait

Essentially i can get the threads to "println" out the two words in a nice iteration, and currently the threads.exeute() statement takes FnOnce and returns the error above. I have tried the different options FnOnce, FnMut, Fn, but they either have their own issues or produce the end-result.

I am assuming that the way I have the code set-up is probably the issue, but I have tried refactoring it several times and it always seems to end in this same result. Cloning doesn't provide any value either. Feel stuck at this point =(. Any help greatly appreciated!

Here is the Rust playground gist of the full code at this point.

Note: Really just trying to learn Rust and get better with the language using real-world projects.

Me again, that function should definitely take FnOnce, but this can't fix self, which you're trying to move into a closure on every iteration of the loop. For most cases, when you want to move some things into a closure and have other things borrowed, you can explicitly borrow outside the closure:

let this = &self;
pool.execute(move || {
    this.get(&word, word2);
});

But since you need 'static, you can't use references. So you can fix that by cloning self.

let this = self.clone();
pool.execute(move || {
    this.get(&word, word2);
});
2 Likes

Side notes:

if self.properties._word_list.len() > 1 && word2 != String::from("none") {

Don't use _name as a field name. Just compile with RUSTFLAGS='-A unused' or the like to silence warnings temporarily.

Maybe word2 should be an Option<String> or a newtype around that.

1 Like

Oh also, if you want to use tokio inside some other threading architecture, you should use the single-threaded runtime so that it doesn't spawn more threads.

#[tokio::main(flavor = "current_thread")]
1 Like

Attempted this implementation. It does seem to reduce the frequency of errors generated from cargo build. But still returns the original error

    fn multi_list(self) {
        let pool = ThreadPool::new(10);
        let Ok(lines) = Fuzzer::read_lines(self.properties._word_list[0].as_str()) else {
            return;
        };
        for line in lines {
            let Ok(word) = line else { return };
            let Ok(lines2) = Fuzzer::read_lines(self.properties._word_list[1].as_str()) else {
                return;
            };
            let word: Arc<str> = word.into();
            for (counter, line2) in lines2.enumerate() {
                let Ok(word2) = line2 else { return };
                let this = self.clone();
                pool.execute(move || {
                    this.get(word, word2);
                    // println! works without issue.
                    //println!("{:?}{:?}{:?}", word, word2, counter + 1);
                });
            }
        }
    }

Produces:

error[E0382]: use of moved value: `word`
   --> src/utils/fuzzer.rs:152:30
    |
148 |             let word: Arc<str> = word.clone().into();
    |                 ----             ------------------- this reinitialization might get skipped
    |                 |
    |                 move occurs because `word` has type `Arc<str>`, which does not implement the `Copy` trait
...
152 |                 pool.execute(move || {
    |                              ^^^^^^^ value moved into closure here, in previous iteration of loop
153 |                     this.get(word, word2);
    |                              ---- use occurs due to use in closure

For more information about this error, try `rustc --explain E0382`.

Thank you for pointing this out. I originally was using the workers flavor an attempt to multi-thread the application. But the performance was not better than when I run it as a single thread. Which was the whole reason I got into this mess XD. But thank you, have made this implementation.

You still need word.clone(). The loop should look like this

for (counter, line2) in lines2.enumerate() {
    let Ok(word2) = line2 else { return };
    let word = word.clone();
    let this = self.clone();
    pool.execute(move || {
        this.get(&word, word2);
    });
}
1 Like

This worked, fantastic thank you! Any idea why its still just as slow making requests as it was without multi-threading?

When dealing with concurrent performance, you should insert the concurrency as high up as possible, so the whole inner for loop should be inside the closure.

for line in lines {
    let this = self.clone();
    pool.execute(move || {
        let Ok(word) = line else { return };
        let Ok(lines2) = Fuzzer::read_lines(this.properties._word_list[1].as_str()) else {
            return;
        };
        for (counter, line2) in lines2.enumerate() {
            let Ok(word2) = line2 else { return };
            this.get(&word, word2);
        }
    })
}

This also lets you avoid cloning as much.

Then, BufRead::lines is not fast, since it allocates an individual string for every line. It's better to use BufRead::read_line so that you can reuse the string allocation. This does mean you can't use an iterator, but you can use a while loop instead.

If your files are small enough to fit in memory, then you can avoid BufRead by reading the files into two Strings (or Arc<str>) at the beginning of the function, and using str::lines, which doesn't need to allocate.

Also consider using thread::scope if you have the entire scope of your task within the function. This lets you avoid cloning anything.

2 Likes

A few extras:

  • You want to divide your work into at least as many chunks as you have CPU threads, so that every CPU thread has work to do. Once you've ensured that, any more division just adds overhead.
  • You may already be processing words as fast as you can read the words from the files. A workload that does more computation per word will benefit more from multithreading.
  • For network requests, the ideal quantity of concurrent requests is usually larger and mostly independent from the number of CPU threads, since most of the time will be spent waiting for the server. For such workloads, using one tokio runtime with a limited number of tasks at a time is ideal, and even a single thread can be just as fast as multiple.
2 Likes

Thank you for this great information!

Yeah seems I get same performance even if I move the closure up. Not sure if its just how my code is laid out at this moment or i'm getting bottle-necked by the requests. But I also noticed that one word list iteration performs just as fast.

Would there be any value in ingesting the words from both files into a tuple vector. Then passing those vectors to a tokio::thread? Like i this Playground

This is only true if you are able to divide the work into chunks which are perfectly equally sized. If the chunks turn out to be unequal amounts of work, then you can spend a lot of time on a single processor finishing the longest chunk.

For a toy example, suppose you have 50 sequential seconds of work, and 4 cores. Unluckily, you end up dividing the work into chunks that take {10, 10, 10, 20} seconds to process. Then you spend 20 seconds total on the job, with only one core working on the last 10 seconds, whereas using more chunks could have finished in 13 seconds total.

Therefore, for maximum throughput, you want to do at least one of these:

  • have somewhat more chunks than parallelism (and put the extra chunks in a queue to be picked up when threads finish previous work),
  • be able to split chunks while they're being processed (as rayon offers), or
  • have something else expensive to do that is necessarily serial (e.g. writing compressed results to disk) as the work is finished.

Otherwise you're likely to leave many cores idle for significant amounts of time.

3 Likes

Thank you for your insights!

Was looking into Rayon and the rayon::scope. This drastically improves the code speed, performance, and user-friendliness. Thanks for that!

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.