Why HTTP Request is so slow?

I'm using Tokio, Serde and Reqwest to do HTTP requests. What I see that request execution is extremely slow comparing to Swift or Kotlin .(There I also do request + serialization, all on the same laptop, at the same time).

Average out of 10:

  • Rust: 1309 ms
  • Kotlin: 335 ms
  • Swift : 265 ms

Is there anything wrong with my code?

My Rust code:

use serde::{Deserialize, Serialize};

//URLs
const SWAPI_BASE_URL: &str = "https://swapi.dev/api/";
const PEOPLE: &str = "people";

//Threaded runtime
lazy_static! {
    static ref RUN_TIME: tokio::runtime::Runtime = tokio::runtime::Builder::new()
        .threaded_scheduler()
        .enable_all()
        .build()
        .unwrap();
    }

//DTO
#[repr(C)]
#[derive(Default, Debug, Clone, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct ResponsePeople {
    pub count: i64,
    pub next: String,
    pub results: Vec<People>,
}

#[repr(C)]
#[derive(Default, Debug, Clone, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct People {
    pub name: String,
    pub height: String,
    pub mass: String,
    #[serde(rename = "hair_color")]
    pub hair_color: String,
    #[serde(rename = "skin_color")]
    pub skin_color: String,
    #[serde(rename = "eye_color")]
    pub eye_color: String,
    #[serde(rename = "birth_year")]
    pub birth_year: String,
    pub gender: String,
    pub homeworld: String,
    pub films: Vec<String>,
    pub species: Vec<String>,
    pub vehicles: Vec<String>,
    pub starships: Vec<String>,
    pub created: String,
    pub edited: String,
    pub url: String,
}

//Callback
#[allow(non_snake_case)]
pub trait SwapiCallback {
    fn onLoad(&mut self, s: Vec<People>);

    fn onError(&mut self, s: &str);
}

//SwapiClient
#[repr(C)]
pub struct SwapiClient();

#[allow(non_snake_case)]
impl SwapiClient {

    pub fn new() -> SwapiClient {
        SwapiClient()
    }

    pub fn loadAllPeople(&self, mut callback: Box<dyn SwapiCallback + Send>) {
        (*RUN_TIME).spawn(async move {
            let res = load_all_people().await;
            match res {
                Ok(root) => {
                    //print response
                    //println!("Response: {:#?}", root.results);
                    callback.onLoad(root.results);
                }
                Err(err) => {
                    let error = format!("Failed to load people {}", err);
                    println!("Error: {}", error);
                    callback.onError(error.as_str())
                }
            }
        });
    }
}

pub async fn load_all_people() -> Result<ResponsePeople, Box<dyn std::error::Error>> {
    let url = format!("{}{}", SWAPI_BASE_URL, PEOPLE);
    let people: ResponsePeople = reqwest::get(url.as_str())
        .await?
        .json()
        .await?;
    Ok(people)
}

Main:

//swapi
pub mod swapi;

//used in swapi client
#[macro_use]
extern crate lazy_static;

//thread sleep
use std::{thread, time};

//measure time
use std::time::Instant;
//DTO
use swapi::{SwapiCallback, People};

fn main() {
    println!("Main started");
    swapi_call_with_thread_sleep();
    println!("Main finished");
}

fn swapi_call_with_thread_sleep() {
    let client = swapi::SwapiClient::new();
    for _i in 0..10 {
        thread_sleep(500);
        let callback = Callback::new();
        client.loadAllPeople(Box::new(callback));
    }
    thread_sleep(10000);
}

//Thread sleep
fn thread_sleep(millis: u64) {
    let sleep = time::Duration::from_millis(millis);
    thread::sleep(sleep);
}

//Create callback
struct Callback {
    start: Instant,
}

impl Callback {
    fn new() -> Callback {
        Callback {
            start: Instant::now(),
        }
    }
}

//Send - types that can be transferred across thread boundaries.
unsafe impl Send for Callback {}

//require to share it between threads
impl SwapiCallback for Callback {
    #[allow(non_snake_case)]
    fn onLoad(&mut self, s: Vec<People>) {
        let diff = self.start.elapsed().as_millis();
        println!("Request: count {}; duration: {}", s.len(), diff);
    }

    #[allow(non_snake_case)]
    fn onError(&mut self, s: &str) {
        println!("Error: {:#?}", s);
    }
}

Results are the same when running with:

* cargo run
* cargo run --release

I found a similar question:

1 Like

Did you by any change run it with cargo run and time the compiler's execution too? Seems unlikely for Rust to be slower than the two, even for a debug build...

1 Like

Please try cargo run --release instead and post its result. I heard that the http stack is so slow on debug mode due to the self sanity check.

2 Likes

Hi,
For reference I added my main as well, I'm measuring time in callback.
I run it with cargo run, and for Swift and Kotlin I aso use debug builds.

Results with cargo run --release are almost the same:

Request: count 10; duration: 1697
Request: count 10; duration: 1323
Request: count 10; duration: 1390
Request: count 10; duration: 1313
Request: count 10; duration: 1318
Request: count 10; duration: 1402
Request: count 10; duration: 1370
Request: count 10; duration: 1314
Request: count 10; duration: 1380
Request: count 10; duration: 1315
Request: count 10; duration: 1329

To make sure, that problem is in get I added measurement directly in the HTTP call method.
Results are the same, average 1300 ms.

pub async fn load_all_people() -> Result<ResponsePeople, Box<dyn std::error::Error>> {
    let start = Instant::now();
    let url = format!("{}{}", SWAPI_BASE_URL, PEOPLE);
    let people: ResponsePeople = reqwest::get(url.as_str())
        .await?
        .json()
        .await?;
    let diff = start.elapsed().as_millis();
    println!("load_all_people: duration: {}", diff);
    Ok(people)
}

You shouldn't call std::thread::sleep in an async program. This will block, preventing any work from happening on that thread for the duration. Use the tokio::time module instead. (This is not relevant; see below.)

1 Like

You're cloning / allocating new strings for all the string data in these structs, which can be a lot slower in rust than a language with a GC which is tuned for small allocations like that.

One option to speed this up would be to use zero-copy deserialization, which would let all the data structures simply borrow from the main loaded string rather than being deserialized individually! It'll be a bit more complicated code and might not work for your use case, but you could try this:

use std::borrow::Cow;
use serde::{Deserialize, Serialize};
use lazy_static::lazy_static;
//URLs
const SWAPI_BASE_URL: &str = "https://swapi.dev/api/";
const PEOPLE: &str = "people";

//Threaded runtime
lazy_static! {
    static ref RUN_TIME: tokio::runtime::Runtime = tokio::runtime::Builder::new()
        .threaded_scheduler()
        .enable_all()
        .build()
        .unwrap();
    }

//DTO
#[repr(C)]
#[derive(Default, Debug, Clone, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct ResponsePeople<'a> {
    pub count: i64,
    #[serde(borrow)]
    pub next: Cow<'a, str>,
    #[serde(borrow)]
    pub results: Vec<People<'a>>,
}

#[repr(C)]
#[derive(Default, Debug, Clone, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct People<'a> {
    #[serde(borrow)]
    pub name: Cow<'a, str>,
    #[serde(borrow)]
    pub height: Cow<'a, str>,
    #[serde(borrow)]
    pub mass: Cow<'a, str>,
    #[serde(borrow, rename = "hair_color")]
    pub hair_color: Cow<'a, str>,
    #[serde(borrow, rename = "skin_color")]
    pub skin_color: Cow<'a, str>,
    #[serde(borrow, rename = "eye_color")]
    pub eye_color: Cow<'a, str>,
    #[serde(borrow, rename = "birth_year")]
    pub birth_year: Cow<'a, str>,
    #[serde(borrow)]
    pub gender: Cow<'a, str>,
    #[serde(borrow)]
    pub homeworld: Cow<'a, str>,
    #[serde(borrow)]
    pub films: Vec<Cow<'a, str>>,
    #[serde(borrow)]
    pub species: Vec<Cow<'a, str>>,
    #[serde(borrow)]
    pub vehicles: Vec<Cow<'a, str>>,
    #[serde(borrow)]
    pub starships: Vec<Cow<'a, str>>,
    #[serde(borrow)]
    pub created: Cow<'a, str>,
    #[serde(borrow)]
    pub edited: Cow<'a, str>,
    #[serde(borrow)]
    pub url: Cow<'a, str>,
}

//Callback
#[allow(non_snake_case)]
pub trait SwapiCallback {
    fn onLoad(&mut self, s: Vec<People>);

    fn onError(&mut self, s: &str);
}

//SwapiClient
#[repr(C)]
pub struct SwapiClient();

#[allow(non_snake_case)]
impl SwapiClient {

    pub fn new() -> SwapiClient {
        SwapiClient()
    }

    pub fn loadAllPeople(&self, mut callback: Box<dyn SwapiCallback + Send>) {
        (*RUN_TIME).spawn(async move {
            let people_string;
            let res = load_all_people().await;
            let res = match res {
                Ok(s) => {
                    
                // need to explicitly keep the string alive since the other
                // structures are now borrowing from it
                people_string = s;
                parse_all_people(&people_string)
                }
                Err(e) => Err(e),
            };
            match res {
                Ok(root) => {
                    //print response
                    //println!("Response: {:#?}", root.results);
                    callback.onLoad(root.results);
                }
                Err(err) => {
                    let error = format!("Failed to load people {}", err);
                    println!("Error: {}", error);
                    callback.onError(error.as_str())
                }
            }
        });
    }
}

pub async fn load_all_people() -> Result<String, Box<dyn std::error::Error>> {
    
    let url = format!("{}{}", SWAPI_BASE_URL, PEOPLE);
    let people = reqwest::get(url.as_str())
        .await?
        .text()
        .await?;
    Ok(people)
}


pub fn parse_all_people(s: &str) -> Result<ResponsePeople, Box<dyn std::error::Error>> {
    let people = serde_json::from_str(s)?;
    Ok(people)
}

That should speed things up a bit, at least.


The fact that debug and release runtimes are the same for Rust is suspicious, though. Usually release speeds up code at least 5x, if not more than that, because most rust code relies on optimizations to be at all fast. Are you sure that it is running the release mode binary?

2 Likes

If I'm reading the code correctly, it only sleeps in main, and that's outside outside of tokio's managed runtime. Each task is independently spawned into the runtime.

it'd be less confusing if all the code was run inside tokio, but I think this should be fine as written.

Ah, yeah, you're right.

When I run with:

  • cargo run
  • cargo run --release
    I get same time results.

About processing of respose, I tried to remove serialization at all and surprisingly result is the same:
average 1300ms

   let url = format!("{}{}", SWAPI_BASE_URL, PEOPLE);
    let start = Instant::now();
    let _text = reqwest::get(url.as_str()).await;
    let diff = start.elapsed().as_millis();
    println!("simple load_all_people: duration: {}", diff);

Maybe there is something like thread priority?

  • The first run is slower because it lazy loads the tokio runtime on the first call, maybe initialize tokio in main?
  • Try to maintain a single reqwest::Client instead of calling reqwest::get multiple times, each call to request::get initializes another Client.
3 Likes

Also, I compiled this library for IOS and build FFI to call it from IOS to test in another anviroment,
results are the same arround 1400 ms.

cargo lipo --release

Thanks for suggestion! I created lazy client and reuse it for requests,
Average out of 10:

  • 662

Faster, but still twice slower than Swift.

Create client:

//Lazy static
lazy_static! {
    //runtime with threaded pool
    static ref RUN_TIME: tokio::runtime::Runtime = tokio::runtime::Builder::new()
        .threaded_scheduler()
        .enable_all()
        .build()
        .unwrap();
    // HTTP client to share
    static ref HTTP_CLIENT: reqwest::Client = reqwest::Client::new();
    }

Call:

pub async fn load_all_people() -> Result<ResponsePeople, Box<dyn std::error::Error>> {
    let url = format!("{}{}", SWAPI_BASE_URL, PEOPLE);

    let start = Instant::now();

    let people: ResponsePeople = HTTP_CLIENT.get(url.as_str())
        .send()
        .await?
        .json()
        .await?;

    let diff = start.elapsed().as_millis();
    println!("simple load_all_people: duration: {}", diff);

    Ok(people)
}
1 Like

This is weird! I ran it locally and it happened for me to!

I got the same thing after splitting up the downloading & decoding:

pub async fn load_all_people() -> Result<ResponsePeople, Box<dyn std::error::Error>> {
    let url = format!("{}{}", SWAPI_BASE_URL, PEOPLE);

    let start = Instant::now();

    let people_text = HTTP_CLIENT.get(url.as_str())
        .send()
        .await?
        .text()
        .await?;

    let diff = start.elapsed().as_millis();
    println!("downloading data: duration: {}", diff);
    let start = Instant::now();

    let people_decoded: ResponsePeople = serde_json::from_str(&people_text)?;

    let diff = start.elapsed().as_millis();
    println!("decoding data: duration: {}", diff);

    Ok(people_decoded)
}

gives values like

Request: count 10; duration: 510
downloading data: duration: 526
decoding data: duration: 1

Could it be the entirely the speed of downloading from the network? Or, could request be making a different request, like not requesting compression, which is resulting in more data?

I think we might find some difference in the request if we use wireshark or a similar networking tool to analyzer the queries. Could you share one of the other language implementations, so we can run both and capture the network requests they're making?

Here is Swift version:

import Foundation

private let SwapiURL = "https://swapi.dev/api/people/";

class NativeSwapiClient {

  func loadPeople(onSuccess: @escaping (([People]?) -> Void)) {
    guard let url = URL(string: SwapiURL) else {return}
    let task = URLSession.shared.dataTask(with: url) { (data, response, error) in
      guard let dataResponse = data,
        error == nil else {
          print(error?.localizedDescription ?? "Response Error")
          return }
      if let result = try? PeopleRespose.decode(data: dataResponse) {
        onSuccess(result.results);
      }
    }
    task.resume()
  }

}

struct PeopleRespose : Codable {
  var count: Int? = nil
  var next: String? = nil
  var results: [People]? = nil
}

extension PeopleRespose {

  static func decode(data: Data) throws -> PeopleRespose? {
    do {
      let decoder = JSONDecoder()
      let user = try decoder.decode(PeopleRespose.self, from: data)
      return user
    } catch let error {
      print(error)
      return nil
    }
  }

}

struct People: Codable {
  //Data fields
  var name: String?
  var gender: String?
  var mass: String?
}

extension People {

  static func decode(data: Data) throws -> People? {
    do {
      let decoder = JSONDecoder()
      let user = try decoder.decode(People.self, from: data)
      return user
    } catch let error {
      print(error)
      return nil
    }
  }

}

For Kotlin, I used coroutines, so it will be harder to setup project:

class SwapiClient : CoroutineScope {

    companion object {
        const val TAG = "NativeSwapiClient"
        const val SWAPI_PEOPLE = "https://swapi.dev/api/people/"
    }

    private var job: Job = Job()

    override val coroutineContext: CoroutineContext
        get() = Dispatchers.Main + job

    @UnstableDefault
    fun loadPeople() {
        val before = System.currentTimeMillis()
        launch {
            val results = withContext(Dispatchers.IO) { load() }
            val after = System.currentTimeMillis()
            val diff = after - before
            Log.e(TAG, "For ${results.size}; elapsed Time: $diff")
        }
    }

    @UnstableDefault
    private fun load(): List<People> {
        val text = URL(SWAPI_PEOPLE).readText()
        val json = Json(JsonConfiguration(strictMode = false))
        val data: PeopleWrapper = json.parse(PeopleWrapper.serializer(), text)
        return data.results ?: emptyList()
    }

}

@Serializable
private data class PeopleWrapper(
    val count: Int?,
    val results: List<People>?
)

@Serializable
private data class People
    (
    val name: String,
    val mass: String,
    val gender: String
)

The timings are too long to be CPU bound, they are network bound. It might be compression, maybe try adding the gzip, brotli features to the reqwest crate.

1 Like

I added both features and don't see any significant difference, average time ~590
Am I doing this right?
toml:

reqwest = { version = "*", features = ["blocking", "json", "gzip", "brotli"] }

And tested two options: both on/off
ON:

    static ref HTTP_CLIENT: reqwest::Client = reqwest::Client::builder()
                                                .gzip(true)
                                                .brotli(true)
                                                .timeout(Duration::from_secs(10))
                                                .build()
                                                .unwrap();

OFF:

    static ref HTTP_CLIENT: reqwest::Client = reqwest::Client::builder()
                                                .gzip(false)
                                                .no_brotli()
                                                .timeout(Duration::from_secs(10))
                                                .build()
                                                .unwrap();
1 Like

I did some more investigating, looked at logs, and compared to just wgeting the page, and reqwest is doing something odd.

Each time it gets a page, it first requests https://swapi.dev/api/people/, it then gets a 301 REDIRECT response for http://swapi.dev/api/people/. When it requests http://swapi.dev/api/people/, it gets another 301 REDIRECT for https://swapi.dev/api/people/. On this third request, it succeeds in getting the data...

wget https::/swapi.dev/api/people doesn't do this. I haven't tested the Swift/Kotlin clients to see what they do.

I also don't know enough about either wireshark nor reqwest/hyper to figure out why this is happening.

One way to observe this is to add a dependency on env_logger, add a line to main env_logger::init();, and then run it like CARGO_LOG=debug cargo run. My output doing that is:

2020-05-08T05:48:24Z DEBUG reqwest::connect] starting new connection: https://swapi.dev/
[2020-05-08T05:48:24Z DEBUG hyper::client::connect::dns] resolving host="swapi.dev"
[2020-05-08T05:48:24Z DEBUG hyper::client::connect::http] connecting to 18.197.142.196:443
[2020-05-08T05:48:25Z DEBUG hyper::client::connect::http] connected to 18.197.142.196:443
[2020-05-08T05:48:25Z DEBUG hyper::proto::h1::io] flushed 58 bytes
[2020-05-08T05:48:25Z DEBUG hyper::proto::h1::io] read 346 bytes
[2020-05-08T05:48:25Z DEBUG hyper::proto::h1::io] parsed 9 headers
[2020-05-08T05:48:25Z DEBUG hyper::proto::h1::conn] incoming body is chunked encoding
[2020-05-08T05:48:25Z DEBUG hyper::proto::h1::conn] incoming body completed
[2020-05-08T05:48:25Z DEBUG hyper::client::pool] pooling idle connection for ("https", swapi.dev)
[2020-05-08T05:48:25Z DEBUG reqwest::async_impl::client] redirecting 'https://swapi.dev/api/people' to 'http://swapi.dev/api/people/'
[2020-05-08T05:48:25Z DEBUG reqwest::connect] starting new connection: http://swapi.dev/
[2020-05-08T05:48:25Z DEBUG hyper::client::connect::dns] resolving host="swapi.dev"
[2020-05-08T05:48:25Z DEBUG hyper::client::connect::http] connecting to 18.197.142.196:80
[2020-05-08T05:48:25Z DEBUG hyper::client::connect::http] connected to 18.197.142.196:80
[2020-05-08T05:48:25Z DEBUG hyper::proto::h1::io] flushed 59 bytes
[2020-05-08T05:48:25Z DEBUG hyper::proto::h1::io] read 373 bytes
[2020-05-08T05:48:25Z DEBUG hyper::proto::h1::io] parsed 6 headers
[2020-05-08T05:48:25Z DEBUG hyper::proto::h1::conn] incoming body is content-length (169 bytes)
[2020-05-08T05:48:25Z DEBUG hyper::proto::h1::conn] incoming body completed
[2020-05-08T05:48:25Z DEBUG hyper::client::pool] pooling idle connection for ("http", swapi.dev)
[2020-05-08T05:48:25Z DEBUG reqwest::async_impl::client] redirecting 'http://swapi.dev/api/people/' to 'https://swapi.dev/api/people/'
[2020-05-08T05:48:25Z DEBUG hyper::client::pool] reuse idle connection for ("https", swapi.dev)
[2020-05-08T05:48:25Z DEBUG hyper::proto::h1::io] flushed 98 bytes
[2020-05-08T05:48:25Z DEBUG hyper::proto::h1::io] read 5887 bytes
[2020-05-08T05:48:25Z DEBUG hyper::proto::h1::io] parsed 10 headers
[2020-05-08T05:48:25Z DEBUG hyper::proto::h1::conn] incoming body is chunked encoding
[2020-05-08T05:48:25Z DEBUG hyper::proto::h1::decode] incoming chunked header: 0xF0A (3850 bytes)
[2020-05-08T05:48:25Z DEBUG reqwest::async_impl::client] response '200 OK' for https://swapi.dev/api/people/
[2020-05-08T05:48:25Z DEBUG hyper::proto::h1::decode] incoming chunked header: 0x69B (1691 bytes)
[2020-05-08T05:48:25Z DEBUG hyper::proto::h1::conn] incoming body completed
[2020-05-08T05:48:25Z DEBUG hyper::client::pool] pooling idle connection for ("https", swapi.dev)
3 Likes

The first request is in fact for Django REST framework (note the absence of trailing slash). In this case, the browser gets the 301 REDIRECT, too. Moreover, wget gets it too:

cerberus@cerberus-X756UQK:~$ wget https://swapi.dev/api/people
--2020-05-08 13:19:20--  https://swapi.dev/api/people
Resolving swapi.dev (swapi.dev)... 18.197.142.196
Connecting to swapi.dev (swapi.dev)|18.197.142.196|:443... connected.
HTTP request sent, awaiting response... 301 MOVED PERMANENTLY
Location: http://swapi.dev/api/people/ [following]
URL transformed to HTTPS due to an HSTS policy
--2020-05-08 13:19:21--  https://swapi.dev/api/people/
Reusing existing connection to swapi.dev:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/json]
Saving to: ‘people’

people                                     [ <=>                                                                         ]   5,41K  --.-KB/s    in 0s      

2020-05-08 13:19:21 (22,0 MB/s) - ‘people’ saved [5541]

And with trailing slash, there's no redirection:

cerberus@cerberus-X756UQK:~$ wget https://swapi.dev/api/people/
--2020-05-08 13:20:41--  https://swapi.dev/api/people/
Resolving swapi.dev (swapi.dev)... 18.197.142.196
Connecting to swapi.dev (swapi.dev)|18.197.142.196|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/json]
Saving to: ‘index.html’

index.html                                 [ <=>                                                                         ]   5,41K  --.-KB/s    in 0s      

2020-05-08 13:20:42 (22,0 MB/s) - ‘index.html’ saved [5541]
4 Likes