What am I doing wrong? Go program is 12X faster than Rust

Rust newb here. I'm trying to get the top stories from Hacker News using hyper. Per the HN API, I issue an initial request to get the list of IDs and then for each ID, I issue subsequent requests to get the metadata for that story. Obviously sequential synchronous requests will be slow, so I'm spinning up 8 threads (this seemed to be the sweet spot) which each take the next ID from the vector of IDs and dispatch a request. However, the Rust implementation underperforms the reference Go implementation by a factor of 12 (2.5s vs 30s). Could someone please point out what is wrong with my Rust implementation?

Did you compile with optimisation? Pass --release to cargo (or -C opt-level=3 if you're using rustc directly).

I would advise against spinning threads by hand: it may be difficult to bring synchronization overhead down and avoid false sharing.

I think that an simpler solution is to use some library for parallel processing.

simple_parallel by @huon should work good in this case.

That said, the problem at hand is IO bound, so I wouldn't expect Rust to be faster than Go (but it would be interesting thing to check!).

Quick experimentation shows that hyper's Client doesn't like to be called from multiple threads. Create one client per thread and it should be a lot faster.

Other comments I have are concerning the Rust, I don't think they'll have much impact on performance. For example, instead of storing a vector index inside the mutex, store the vector instead, and pop items off of it:

fn next(stack: &mut Arc<Mutex<Vec<i32>>>) -> Option<i32> {
    stack.lock().unwrap().pop()
}

Even more idiomatic would probably be to use a concurrency library that offers a worker pool, and a "real" queue, using channels to transfer requests and results.

I did compile with --release, but this task isn't CPU bound so it shouldn't matter anyway.

Hyper's documentation recommended sharing the client with Arc, so it seems odd that it would be the cause of the poor performance. I'll give it a try anyway. I'll also check out concurrency libraries.

I'm not expecting Rust to be faster than Go, but 12X slower seemed dramatic. Do you have any suggestions for a parallelism library?

I agree, I'm just reporting my results. The Hyper developers might be interested in a bug report :slight_smile:

I would use rayon's parallel iterators or simple_parallel. Not sure which one is a better fit for the job here, probably rayon. Another useful crate for more fine grained parallelism (scoped threads and lock free data structures) is crossbeam.

If you want to investigate async io support in Rust, look at mio.

Interesting. I tested with one client per connection (rather than a shared client) and it made a ~4X improvement. I'll file a bug with Hyper. The Rust version is now only ~3X slower than the Go version.

I'd take this up with the Hyper developers, as others have said there's no reason for Rust to be slower than Go here (and I don't think the threading or json decoding would make a difference). I guess they'll be interested!

Rust relies heavily on optimizations, even simple code can get 20x faster by passing in the flag.

The original version took 2mins on my machine, this version takes 20s:

extern crate hyper;
extern crate rustc_serialize;
extern crate scoped_threadpool;

use std::io::Read;

use hyper::Client;
use hyper::header::Connection;
use rustc_serialize::json;
use scoped_threadpool::Pool;

#[derive(RustcDecodable, RustcEncodable)]
struct Story {
    by: String,
    id: i32,
    score: i32,
    time: i32,
    title: String,
}

fn main() {
    let client = Client::new();
    let url = "https://hacker-news.firebaseio.com/v0/topstories.json";
    let mut res = client.get(url).header(Connection::close()).send().unwrap();
    let mut body = String::new();
    res.read_to_string(&mut body).unwrap();
    let vec: Vec<i32> = json::decode(body.as_str()).unwrap();

    let mut pool = Pool::new(8);

    pool.scoped(|scope| {
        for id in vec {
            let client = Client::new();
            scope.execute(move || {
                let url = format!(
                    "https://hacker-news.firebaseio.com/v0/item/{}.json",
                    id,
                );

                let mut res = client.get(url.as_str())
                    .header(Connection::close())
                    .send()
                    .unwrap();

                let mut body = String::new();
                res.read_to_string(&mut body).unwrap();
                let story: Story = json::decode(body.as_str()).unwrap();
                println!("{}", story.title);
            });
        }
    });
}
2 Likes

When I ran your rust code it downloaded 500 messages. With 8 threads, that means each thread is downloading 62.5 messages. With your Go timings that means each message takes 40 milliseconds. Pinging hacker news from my computer i'm looking at 36 to 44 milliseconds for each ping. So you numbers are suggesting that Go is running at about network latency speeds.

Running the rust version (with each thread having it's own hyper client) I'm seeing 12 to 13 seconds to download everything. I'm also not seeing a big difference between debug and release...

Anyway trying to time just the download part i.e:

    let start = time::precise_time_s();

    let mut res = client.get(url.as_str())
        .header(Connection::close())
        .send()
        .unwrap();

    res.read_to_string(&mut body).unwrap();

    let duration = time::precise_time_s() - start;

I'm seeing an average time of between 180 milliseconds per request to 200 milliseconds....so that is where the time is being spent. That would suggest the slowdown is somewhere in hyper.

However when I run:

time wget -q -a /dev/null https://hacker-news.firebaseio.com/v0/item/11606658.json

I'm seeing times from 190 to 200 milliseconds. Which is comparable to the rust timings.

Running the request in Chrome, it's giving a time of between 140 milliseconds and 80 milliseconds, however the HTTP response does specify keep-alive so I think the 140 is when the connection needs to be created, and the 80 is when it's already open.....

So in the end I'm not sure I trust the Go numbers, maybe the connection is being reused and the hyper one is not?

1 Like

On my machine, I'm noticing no difference in performance between this version and my second Rust version (the second Rust version is not posted in this thread; it's the same as the first version except each thread gets their own client). It still takes 6.5 seconds to run compared to Go's 1.5 seconds. :confused:

I filed a bug against Hyper, and the devs pointed out that the .header(Connection::close()) accounted for the remainder of the performance gap. I cargo-culted that from the Hyper documentation. After recalibrating the thread count, I'm able to get the Rust version to perform on par with the Go version.

Thanks to everyone who offered suggestions.

EDIT: For posterity, the github issue is here: Slow parallel client performance · Issue #777 · hyperium/hyper · GitHub. In it, one of the Hyper devs mentions that the shared client issue is known, and a fix should ship today.

3 Likes