I'm implementing a Postfix transport, and I was looking for some advice to milk some more performance out of it. I have a client and a daemon. Postfix executes the client for every single incoming email that matches a transport rule. The client reads the email from stdin, and forwards it over a UDS where my daemon is listening. This can happen concurrently,, with an upper limit of about 1000 clients at one time. I've included a simplified version of my program below:
Client
use std::os::unix::net::UnixStream;
use std::io::prelude::*;
use std::io::stdin;
use std::path::Path;
use std::env;
use std::process::exit;
fn main() {
let mut args = env::args().skip(1);
let socket_file_path = args.next().expect(
"Could not parse socket file path from command args",
);
let mut email_body: Vec<u8> = Vec::with_capacity(1024);
stdin().read_to_end(&mut email_body).expect(
"Could not read email from stdin",
);
match UnixStream::connect(Path::new(&socket_file_path)) {
Ok(mut stream) => {
if let Err(e) = stream.write_all(email_body.as_ref()) {
eprintln!("Writing to socket failed due to {}", e);
exit(1);
}
}
Err(e) => {
eprintln!("Connecting to socket failed due to {}", e);
exit(1);
}
}
}
Daemon
use std::env;
use std::io::prelude::*;
use std::os::unix::net::{UnixListener, UnixStream};
use std::path::Path;
use std::process::exit;
use std::thread;
fn handle_stream(mut stream: UnixStream) {
let mut buf = Vec::with_capacity(1024);
stream.read_to_end(&mut buf).expect(
"Could not read to end",
);
}
pub fn main() {
let mut args = env::args().skip(1);
let socket_file_path = args.next().expect(
"Could not parse socket file path from command args",
);
match UnixListener::bind(Path::new(&socket_file_path)) {
Err(e) => {
eprintln!("Could not bind listener due to {}", e);
exit(1);
}
Ok(listener) => {
for conn in listener.incoming() {
match conn {
Err(e) => {
eprintln!("Could not accept connection due to {}", e);
exit(1)
}
Ok(stream) => {
thread::spawn(|| handle_stream(stream));
}
}
}
}
}
}
Timing
This is how I'm testing it:
test.sh
#!/bin/bash
for ((i=0;i<100000;i++))
do
echo "this is an email" | ./client /tmp/socket.s &
done
wait
Which I run like below:
./daemon /tmp/socket.s &
time ./test.sh
It takes a little over around a minute on my mac and a little under a minute on my work computer:
# Mac
real 1m19.161s
user 3m45.578s
sys 3m6.898s
# Work CentOS
real 0m58.010s
user 3m27.182s
sys 2m57.073s
I've tried a couple of different approaches to make it more efficient, such as using a threadpool, which haven't made any noticeable differences. I tried to use tokio-uds
as well:
extern crate futures;
extern crate num_cpus;
extern crate tokio_core;
extern crate tokio_io;
extern crate tokio_uds;
use std::env;
use std::path::{Path, PathBuf};
use futures::Future;
use futures::stream::Stream;
use tokio_io::io::read_to_end;
use tokio_uds::UnixListener;
use tokio_core::reactor::Core;
fn main() {
let addr = env::args().nth(1).expect("Could not parse socket path address");
let path = PathBuf::from(addr);
read(&path);
}
fn read(addr: &Path) {
let mut core = Core::new().unwrap();
let handle = core.handle();
let listener = UnixListener::bind(addr, &handle).unwrap();
let server = listener.incoming().for_each(move |(socket, _)| {
let buf = Vec::new();
let msg = read_to_end(socket, buf).map(move |(_, _)| {
()
}).map_err(move |e| panic!(e));
handle.spawn(msg);
Ok(())
});
core.run(server).unwrap();
}
But the results weren't very promising:
Mac
real 1m58.518s
user 5m42.730s
sys 4m36.260s
Work CentOS
real 0m59.991s
user 3m46.163s
sys 2m54.927s
I feel like sending a single line over 100,000 sockets shouldn't take a full minute; is there anything I'm missing?