As I understand it, Reqwest sets up connection pooling automatically, but for whatever reason I don't think it's working for me. In a simplified example below I create exactly one client, then perform a number of get requests to the same (base) URL. I would have thought that if connection pooling was set up correctly there would be ONE network connection created, but if I do netstat -ban I see a large number of them getting created. Am I doing something wrong? I got the keep_alive tip from this post, which is technically just Hyper, but I would think that the same would potentially apply to Reqwest since it's just a wrapper around Hyper.
extern crate reqwest;
use reqwest::Client;
use reqwest::header::Connection;
const MAX_PAGE_COUNT: u32 = 1000;
fn main() {
let client = Client::new().unwrap();
let mut all_ids = Vec::new();
for page in 1..MAX_PAGE_COUNT {
let url = format!("https://esi.tech.ccp.is/latest/universe/types/?page={}", page);
let mut resp: Vec<u32> = client.get(&url).header(Connection::keep_alive()).send().unwrap().json().unwrap();
if resp.len() == 0 {
break;
}
all_ids.append(&mut resp);
println!("Got page {}", page);
}
println!("counted {} ids", all_ids.len());
for id in &all_ids {
let url = format!("https://esi.tech.ccp.is/latest/universe/types/{}/", id);
client.get(&url).header(Connection::keep_alive()).send().unwrap();
println!("got response for id {}", id);
}
}
So I tried with just plain Hyper and got the same result. And I'm guessing the problem has less to do with connection pooling than it does with just keeping the connection open / persistent, but I have no idea why they would be closing if I'm even explicitly putting in a keep-alive header. Does anybody have a clue? @seanmonstar?
Let me preface this by saying that I'm not an expert on all things HTTP. If I say something incorrect, I hope that someone will (politely) correct me and that everything I say here should be taken with a grain of salt and investigated until you're satisfied. With that out of the way, I'll do my best to answer your questions.
First, a clarification. There are two actually distinct things that you're talking about in your first post. The first is connection pooling, and the second is connection persistence. Connection pooling is a means by which requests can be made more efficient by keeping a handful of connections open so that they remain ready for use. This is why you are seeing more than one connection being made. Connection persistence is a rather different kind of optimization which, in HTTP 1.x, is used to keep a given connection active so that it can be reused for multiple requests.
With regards to connection persistence, If the server hosting esi.tech.ccp.is does not support HTTP 2.0 (which includes facilities for multiplexing requests on a single TCP connection), then you are unfortunately stuck with HTTP 1.x's Connection: keep-alive header. This is supposed to allow for the kind of behavior you are hoping for, however in the case of HTTP 1.x, the time during which a connection is allowed to be kept open can be cut short by the server. Some servers will not allow connections to be kept alive or else only allow them to be kept alive for a relatively brief period of time as a denial of service protection.
In this case, I suspect that you have already done everything that can be done at this level to try to use a persistent connection. If the server you are communicating with decides to terminate one of your connections, Hyper will have to establish a new one, but that should be seen as a convenience. The sub-optimal behavior here is a result of how HTTP 1.x is designed. If you do not wish to see multiple connections being made, then you will have to instruct Hyper not to use connection pooling - however that too, I think, should probably be seen as a convenience.
Wow, thanks for the great reply! I'm pretty sure that it does support HTTP 2.0 . Do I have to do anything special to enable it in Hyper? The biggest problem that I'm trying to overcome is that my little program is exhausting all of the ports it can use because it makes so many (thousands) of connections. So if I disable connection pooling like you mentioned in your last paragraph, it should only use 1 connection and I should be in better shape, is that correct? And do you know how to disable it?
No, you don't want to disable connection pooling in reqwest. That would mean that a brand new connection is used for every request, always. Also, you don't need the Connection header, as keep-alive is the default with HTTP/1.1.
Many times, an issue preventing using the connection again is that the contents were not completely read. I see you make a request for each ID, but don't actually do anything with the bytes of the response body. Reqwest can't use a connection for a new request if there are still bytes from a previous response. So what should it do? Automatically read to the end? What if the request were gigabytes long? Instead, hyper (and reqwest) don't read until you ask for it.
That means all those bytes are in the way before a new Response would be able to determine if it were a success or not. So, that socket is tossed, since it appears you dont want the bytes on it.
To allow the socket to be reused, you can read the response to the end. If you don't want the full body, you could make a HEAD request to only get the headers.
Hmm, I didn't know that about emptying out the entire response first. Still, I added a dummy read_to_end() to just to consume all of it, but am still seeing the same behavior. It's also creating a bunch of connections for the first part, which creates the ID list.