When I use reqwest with proxy to capture data, because I cannot guarantee the validity of the proxy, I need to check its validity every time a request is made. Especially since these proxies may not survive for more than five minutes, frequent switching to new available proxy is required.
In Rust's reqwest
library, a Client
is typically built with specific configurations such as proxy settings, timeouts, and connection limits. These configurations are set when the Client
is created and are immutable thereafter. This means that if I need to change any of these configurations, like switching to a different proxy, I need to create a new Client
instance with the new settings.
But rebuilding the client seems costly or inconvenient:
The author seems to be discussing this issue, but there is currently no corresponding implementation.
I asked how GTP Why Not Rebuild?
Rebuilding the client might seem inefficient, but in most cases, it is not a significant performance hit, especially if proxies are changed infrequently. The benefits of maintaining immutability and thread safety generally outweigh the costs of occasionally rebuilding the client.
Due to the high cost of commercial rotating proxies, I currently have over ten static proxies. Then I came up with a good method, which is to build a dedicated proxy server that automatically switches proxies for each request.
In this way, the proxy server handles the complexity of proxy switching, and my client can simply send requests to the proxy server without worrying about the proxy management in reqwest
.
How It Works:
- Proxy Server: The server receives incoming HTTP requests from my client.
- Proxy Switching: For each incoming request, the proxy server selects a different upstream proxy from a pool of available proxies.
- Forwarding: The proxy server forwards the request to the selected upstream proxy and then returns the response to the client.
My question is how to build such a proxy server? The following is the answer generated by GTP for me, but it cannot be used.
use hyper::{Client, Request, Response, Body, Server, Uri};
use hyper::service::{make_service_fn, service_fn};
use tokio::sync::Mutex;
use std::sync::Arc;
use rand::seq::SliceRandom;
#[derive(Clone)]
struct ProxyPool {
proxies: Arc<Mutex<Vec<String>>>,
}
impl ProxyPool {
fn new(proxies: Vec<String>) -> Self {
Self {
proxies: Arc::new(Mutex::new(proxies)),
}
}
async fn get_random_proxy(&self) -> Option<String> {
let proxies = self.proxies.lock().await;
proxies.choose(&mut rand::thread_rng()).cloned()
}
}
async fn handle_request(req: Request<Body>, proxy_pool: ProxyPool) -> Result<Response<Body>, hyper::Error> {
// Get a random proxy from the pool
if let Some(proxy) = proxy_pool.get_random_proxy().await {
println!("Using proxy: {}", proxy);
// Create the URI by replacing the scheme and authority with the proxy's
let mut uri_parts = req.uri().clone().into_parts();
uri_parts.scheme = Some("http".parse().unwrap());
uri_parts.authority = Some(proxy.parse().unwrap());
let new_uri = Uri::from_parts(uri_parts).unwrap();
// Create a new request with the new URI
let proxy_request = Request::builder()
.method(req.method())
.uri(new_uri)
.body(req.into_body())
.unwrap();
// Create an HTTP client and forward the request
let client = Client::new();
client.request(proxy_request).await
} else {
Ok(Response::builder()
.status(502)
.body(Body::from("No available proxy"))
.unwrap())
}
}
#[tokio::main]
async fn main() {
let proxy_pool = ProxyPool::new(vec![
"http://proxy1.example.com:8080".to_string(),
"http://proxy2.example.com:8080".to_string(),
"http://proxy3.example.com:8080".to_string(),
]);
let make_svc = make_service_fn(|_| {
let proxy_pool = proxy_pool.clone();
async {
Ok::<_, hyper::Error>(service_fn
move |req| {
handle_request(req, proxy_pool.clone())
})
}
});
let addr = ([127, 0, 0, 1], 3000).into();
let server = Server::bind(&addr).serve(make_svc);
println!("Proxy server running on http://{}", addr);
if let Err(e) = server.await {
eprintln!("Server error: {}", e);
}
}
Then I searched for the corresponding open source project on GitHub using the keyword 'proxy server', but most of them is forward/reverse proxy server, and little about forwarding traffic from client to upstream proxy.
Anyone know any proxy servers related to my description? thanks