Do I need async ? Or should I use the blocking version of the library?

I am using the roux - Rust library for Reddit API access and want to download all my Content. (Saved Items, Submissions and Comments)

The library is async by default, so I have functions like this

// TODO Check order of saved items
pub fn get_saved_item_stream(account: &Me) -> impl Stream<Item = Content> + '_ {
    let mut after_id: Option<String> = None;
    stream! {
        loop {
            let saved_items = match account.saved(after_id.as_ref().map(|aid| FeedOption::new().after(aid))).await {
                Ok(saved) => {
                    after_id = saved.data.after;
                    saved.data.children
                },
                Err(_) => vec![],
            };
            for saved_item in saved_items {
                yield Content::try_from(saved_item.data).unwrap();
            }
            if after_id.is_none() {
                break;
            }
        }
    }
}

To get a stream of my saved items. I would ultimately write this stream to some file in a serialized format.

Similarly I have two more streams (One for submissions, other for comments) and I would write those into their own files as well (submissions.json, comments.json maybe?)

pub fn get_submission_stream(account: &Me) -> impl Stream<Item = Content> + '_ {
// .....
}
pub fn get_comment_stream(account: &Me) -> impl Stream<Item = Content> + '_ {
// .....
}

My issue is I don't see how I am benefiting from async here. The only parallelization I can think of is I can parallel write to the 3 files from the 3 async streams. But how would I do that? File I/O is synchronous in the background right? Is it even useful/possible?

Alternatively, the Roux library has a blocking feature. I can convert those streams to Iterators and just use blocking functions for file writing?

I am not sure what is the right approach here, probably because I still have a hard time wrapping my head around how async works and what it's benefits are. Would appreciate any insights.

I what context does this code run? A server? A CLI tool? A GUI application? Does it need to be non-blocking? Async isn't about parallelization, it is about not blocking. If blocking is fine (eg. a CLI application), then you don't need async.

2 Likes

I am going to use a CLI tool to simply download all my content from those 3 endpoints and save them to 3 files for now.

At some point in the future I want to write a GUI. But even then, the idea is that there will be a synchronize button which can probably trigger the CLI in the background and it will still want to write all of this info to some files and when it's done the GUI might refresh.

Also when you say

Async isn't about parallelization, it is about not blocking.

What does that mean? If I have 3 async streams, Can't I write them to 3 different files in parallel? Wouldn't that be faster than having 3 iterators and waiting for the files to be written one by one after hitting each endpoint?

You can, but async isn't doing the parallelization. Async runtimes can be single-threaded. They can also be multi-threaded. You also don't need async to parallelize (you could spawn 3 threads for the 3 endpoints manually). You also don't even need threads to parallelize requests in modern times (e.g. HTTP2 is usually/by default pipelined).

It sounds to me like you don't yet need async, overall.

2 Likes

There is an important distinction between concurrency and parallelism. Async I/O can make progress on many streams concurrently without parallelization.

If you need an analogy, imagine juggling; you can concurrently juggle three pins, but only touch up to two in parallel. The third pin must be in flight (literally).

2 Likes