If you're sure you need threads for the new connection handling and I/O part, then you'll likely want to not test something that will be ran on Linux, with MIO on macOS. MIO abstracts away the underlying system libraries that provide async I/O. On Linux that is epoll and on macOS it is kqueue. MIO's public API resembles epoll more than anything, so know that what your testing against will be very different than what will be actually used.
Since you're looking at working on Linux, all info will be specific to it. Unless you know you need a new connection receiver and event loop per core, you shouldn't set it up that way. The whole point of asynchronous I/O is that many things can be done on one thread, because most of your operations are wait operations. Until actual implementation and measurements proves otherwise, you will want your setup to have 2 threads. One a listener, and one for I/O:
fn thread_one() {
// 1. Wait for new connections
// 2. Pass to event loop on success
}
fn thread_two() {
// 1. Wait for event loop
// 2. Read / Write data
// 3. Handle I/O errors/disconnects
// 4. Do stuff with data
// 5. Re-arm socket
}
The usual part threading comes into play is step 4. Maybe you are making db calls, or doing heavy computation with the data, etc... That is where you want multiple threads to be used, when they are actually needed. The main reason this should be your default setup with Linux, is because your ethernet interrupts are all assigned to one core by default. You will be paying more in context switching and cache misses than what you gain from having multiple cores triggered by interrupts specific to one core. You'll need a lot more configuration of the kernel to support your current architecture plan if that is definitely the path you need to go down.
Your TcpListener shouldn't be setup in either Edge or Level mode. It should be blocking. Not sure if MIO lets you specify a backlog queue to the socket, but if so, use it. All your accepted sockets should be edge triggered by default. Edge Triggered mode, puts you in control of your performance. This means your read section per socket needs to change to this:
fn read(client: &mut TcpStream) -> Result<Vec<u8>, ()> {
let mut buf = Vec::<u8>::with_capacity(4098);
let mut tmp_buf = [0u8; 4098];
loop {
let r_result = client.read(&mut tmp_buf);
if r_result.is_err() {
let e = r_result.unwrap_err();
if e.kind() == ErrorKind::WouldBlock {
return Ok(buf);
} else {
return Err(());
}
}
let n_read = r_result.unwrap();
let slice = &tmp_buf[0..n_read];
buf.extend_from_slice(slice);
}
}
Trying working your setup into that system and see where that get you. I believe this is also the approach @vitalyd was recommending you move to. I just added some example code to help.