Unexpected performance results comparing async vs sync

Hello,

I was curious about performance comparison between sync and async filesystem operations (using tokio::fs), so I have implemented two simple directory crowlers - one using sync fs and other using async fs.

AFAIK, results I got are a little bit weird:

Sync: 8 ms
Async: 33 ms

I expected async version to be faster, whereas it took much more time to do the same thing.

Here is the code:

use std::future::Future;
use std::path::Path;
use std::pin::Pin;
use std::{fs, io};
use std::time::{Instant};

struct DirCrawlerAsync  {
    ignored_path: Vec<String>
}

impl DirCrawlerAsync {
    pub fn visit_dirs<'a>(
        &'a self,
        dir: &'a Path,
        entries_buffer: &'a mut Vec<String>,
    ) -> Pin<Box<dyn Future<Output = io::Result<()>> + 'a>> {
        Box::pin(async move {
            println!("Entering dir {:?}", dir);

            if dir.is_dir() && !self.ignored_path.contains(&dir.to_str().unwrap().to_string()) {
                let mut entries = tokio::fs::read_dir(dir).await?;

                while let Some(entry) = entries.next_entry().await? {
                    let path = entry.path();
                    if path.is_dir() {
                        self.visit_dirs(&path, entries_buffer).await?;
                    } else {
                        if let Ok(path_as_str) = path.into_os_string().into_string() {
                            entries_buffer.push(path_as_str);
                        }
                    }
                }
            }

            Ok(())
        })
    }
}

struct DirCrawlerSync  {
    ignored_path: Vec<String>
}

impl DirCrawlerSync {
    pub fn visit_dirs(
        &self,
        dir: &Path,
        entries_buffer: &mut Vec<String>,
    ) -> io::Result<()> {
        println!("Entering dir {:?}", dir);

        if dir.is_dir() && !self.ignored_path.contains(&dir.to_str().unwrap().to_string()) {
            let mut entries = fs::read_dir(dir)?;

            for entry in entries {
                let entry = entry?;
                let path = entry.path();
                if path.is_dir() {
                    self.visit_dirs(&path, entries_buffer)?;
                } else {
                    if let Ok(path_as_str) = path.into_os_string().into_string() {
                        entries_buffer.push(path_as_str);
                    }
                }
            }
        }

        Ok(())
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut entries: Vec<String> = vec![];


    let now = Instant::now();
    let crawler_sync = DirCrawlerSync { ignored_path: vec![
        // String::from("./target")
    ] };
    crawler_sync.visit_dirs(Path::new("."), &mut entries)?;
    println!("Sync: {}", now.elapsed().as_millis());


    let now = Instant::now();
    let crawler_async = DirCrawlerAsync { ignored_path: vec![
        // String::from("./target")
    ] };
    crawler_async.visit_dirs(Path::new("."), &mut entries).await?;
    println!("Async: {}", now.elapsed().as_millis());

    Ok(())
}

Do you guys know the reason? Is it because of 'Box' usage and hence heap allocation which does not occur that often in sync version, or it's just because of slow IO, or maybe I made some stupid mistake somewhere?

Thanks

Async file system operations are usually implemented with synchronous operations run on the threadpool. So async fs cost = sync cost + threadpool cost + async runtime cost.

Windows supports true async fs ops, but other major OSes tends not to do so. Linux recently added io_uring api which can be used to implement true async fs ops, but it's not widely adopted yet. For BSD, I have no idea.

2 Likes

There are no runtimes that currently support true async file IO on any OS at this time.

If you used network IO, things would be different. A single individual operation would not become faster, but performing many network operations at the same time would be improved.

1 Like

In addition to other answers, here's a writeup when and why async becomes sync on Windows: https://docs.microsoft.com/en-us/troubleshoot/windows/win32/asynchronous-disk-io-synchronous

1 Like

Thanks for your replies. I kept testing it ad added HTTP request for each crawled directory, blocking for sync and non blocking for async and results keep being strange:

Sync: 140158
Async: 139794

Here is the code:

use std::future::Future;
use std::path::Path;
use std::pin::Pin;
use std::{fs, io};
use std::time::{Instant, Duration};
use std::{thread, time};
extern crate reqwest;

struct DirCrawlerAsync  {
    ignored_path: Vec<String>
}

impl DirCrawlerAsync {
    pub fn visit_dirs<'a>(
        &'a self,
        dir: &'a Path,
        entries_buffer: &'a mut Vec<String>,
    ) -> Pin<Box<dyn Future<Output = io::Result<()>> + Send + 'a>> {
        Box::pin(async move {
            // tokio::time::delay_for(time::Duration::from_millis(10)).await;
            let body = reqwest::get("https://www.rust-lang.org")
                .await
                .unwrap()
                .text()
                .await;

            println!("Entering dir {:?}", dir);

            if dir.is_dir() && !self.ignored_path.contains(&dir.to_str().unwrap().to_string()) {
                let mut entries = tokio::fs::read_dir(dir).await?;

                while let Some(entry) = entries.next_entry().await? {
                    let path = entry.path();
                    if path.is_dir() {
                        self.visit_dirs(&path, entries_buffer).await?;
                    } else {
                        if let Ok(path_as_str) = path.into_os_string().into_string() {
                            entries_buffer.push(path_as_str);
                        }
                    }
                }
            }

            Ok(())
        })
    }
}

struct DirCrawlerSync  {
    ignored_path: Vec<String>
}

impl DirCrawlerSync {
    pub fn visit_dirs(
        &self,
        dir: &Path,
        entries_buffer: &mut Vec<String>,
    ) -> io::Result<()> {
        // thread::sleep(time::Duration::from_millis(10));
        let body = reqwest::blocking::get("https://www.rust-lang.org").unwrap().text().unwrap();

        println!("Entering dir {:?}", dir);

        if dir.is_dir() && !self.ignored_path.contains(&dir.to_str().unwrap().to_string()) {
            let entries = fs::read_dir(dir)?;

            for entry in entries {
                let entry = entry?;
                let path = entry.path();
                if path.is_dir() {
                    self.visit_dirs(&path, entries_buffer)?;
                } else {
                    if let Ok(path_as_str) = path.into_os_string().into_string() {
                        entries_buffer.push(path_as_str);
                    }
                }
            }
        }

        Ok(())
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {

    let now = Instant::now();
    let res = tokio::task::spawn_blocking(move || {
        let mut entries: Vec<String> = vec![];
        let crawler_sync = DirCrawlerSync { ignored_path: vec![
            // String::from("./target")
        ] };
        crawler_sync.visit_dirs(Path::new("."), &mut entries)
    }).await?;
    println!("Sync: {}", now.elapsed().as_millis());

    let now = Instant::now();
    tokio::spawn(async {
        let mut entries: Vec<String> = vec![];
        let crawler_async = DirCrawlerAsync { ignored_path: vec![
            // String::from("./target")
        ] };
        crawler_async.visit_dirs(Path::new("."), &mut entries).await
    }).await?;
    println!("Async: {}", now.elapsed().as_millis());

    Ok(())
}

I still don't see much improvement in async version...

The main issue here is that you're always immediately awaiting a Future. It is far more efficient to spawn them and wait on them together, which allows the runtime to schedule them smartly.

The rt-threaded feature of tokio will also likely provide a slight performance boost once the previous change is made.

On a side note, you can also use an async fn on inherent implementations.

2 Likes

Looks like you're calling self.visit_dirs(&path, entries_buffer).await?; in sequence so you won't get any benefit of the async approach. You might be better off creating e.g. FuturesUnordered pushing futures for each entry in it and then iterator over that to run them all concurrently.

Probably do all the fs stuff in one blocking block, getting your dirs, initialising FuturesUnordered and aftre the blocking block, iterate over FuturesUnordered.

2 Likes

The place where async/await gives you an advantage is when you do many things at the same time. If you only do one thing at the time, then it wont become faster.

3 Likes

Asynchronous programming is the most popular answer to the C10K problem - it means it has advantage when you have over 10K concurrent IO bounded thread of executions. Since you don't perform any concurrent processing in your code, the async version will hardly make any advantage.

1 Like

Ok, so If you now know what I am trying to achieve (dir crawling + HTTP requests for each entry) what are your suggestions regarding performance optimizations in my scenario? Now I know that async usage does not give any advantages, but maybe there is some other solution which may be helpful in my case?

Well if you make the while loop parallel, it should become significantly faster.

Hmm, info taken from: https://docs.rs/rayon/1.3.1/rayon/fn.join.html which is supposed to perform tasks in pararell :

The assumption is that the closures given to join() are CPU-bound tasks that do not perform I/O or other blocking operations. If you do perform I/O, and that I/O should block (e.g., waiting for a network request), the overall performance may be poor.

So is it really a good path? And btw I won't avoid async calls in pararell computations anyway, am I right? Do you have any good example for similiar case? I am really stuck right now ...

What is it that you actually need to do?

Somebody well known, wish I could remember who, summed up the situation re: real threads and async threads as:

Sync threads are for doing work, async threads are for waiting.

That is to say if you have a lot of computation to do performance can be gained by distributing the work over multiple cores with sync threads. On the other hand if you have lots of waiting to do, like handling a million concurrent http requests, waiting on their db query responses etc, then performance can be gained. or at least resources saved, by doing as much waiting on one core as possible with async threads.

It's not clear to me how either of these helps with the problem of directory crawling. I see three possible scenarios:

  1. You have a good old fashioned spinning hard drive. It can only read or write one thing at a time. In this case it does not matter what kind of threads you have or even if you have threads at all. That disk is the bottleneck and it is serializing all your steps.

  2. Your directory structure is spread over thousands of disks, perhaps on multiple machines accessed over the network. Now potentially many disc accesses can be in progress at the same time. There a lot of waiting for responses going on. Using async threads to parallelize the job should go faster.

  3. It could happen that your entire directory tree is already buffered in a huge amount of RAM. In this case you might as well only have one real sync thread per core and let them rip through it as fast as possible, there is no waiting going on.

3 Likes

Who says you have to parallelize it with rayon? Tokio is certainly a good tool for paralleling lots of network requests.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.