Very mysterious blockage in tokio

I am writing a relay socks5 proxy. I have a primary which receives initial socks5 messages and decodes the headers and there are multiple workers connected via udp socket to the primary which receives already decoded binary stream and my own headers and tunnel the payload to the actual destination.

It works great so far with downloads however when I try to upload something at some point while trying to write the data to the website the write gets blocked.
I have debugged the udp part and theres nothing blocked at that end its only blocked when It is trying to go outside.
Here comes the mysterious part, When I add some delay at some point or if I reduce the buffer size to something like 5 bytes everything works perfect I get proper upload and download speeds on speedtest but when I use say 1K buffer download still works fine I get full score but upload goes around 0.2 mbits initially then slowly drops to zero due to the blocking.
Does anyone have any idea what might be going on ?
I just cant understand why download would work fine but upload would fail.
My internal protocol is pretty simple and has just 4 commands

pub enum Command {
    Connect(Uuid, [u8; 4], [u8; 2], [u8; 4]), //dest_ip(4),dest_port(2),source_ip(4)
    Data(Uuid, Vec<u8>),                      //uuid(16),data(n)
    End(Uuid),                                //uuid(16)
    Error(String),
}

It's unclear what could be wrong from your description. Does it abruptly stop?

from the looks of the things udp transmission is successful (looking at the length of the data sent) however seems the data is never fully going through the UnboundedChannel.
Say I have 1K buffer I see all 1K chunks written and at the end theres a 860 something buffer which never gets sent to the remote client. I can log it on the internal udp end but It doesnt come out of the channel for some reason and then I get A connection reset from the remote party

I also noticed when I download a large file it is very nice and fast for the 99 percent and during the last 1 percent the connection is being closed

when I add 1 nanosecond delay_for before write everything seems to be perfect

Can you show me that write?

I am rewriting the entire thing using sync rust but I wonder if this approach is correct ?

pub fn local_handler(port: u16, ip: [u8; 4], mut sock: TcpStream) -> io::Result<()> {
    let mut target = TcpStream::connect(SocketAddr::from((ip, port)))?;
    target.set_nonblocking(true)?;
    sock.set_nonblocking(true)?;
    let mut buf = [0u8; 1024];
    loop {
        copy_async(&mut sock, &mut target, &mut buf)?;
        copy_async(&mut target, &mut sock, &mut buf)?;
        thread::sleep(Duration::from_millis(10))
    }
}

fn copy_async(
    src: &mut TcpStream,
    mut target: &mut TcpStream,
    mut buf: &mut [u8],
) -> io::Result<()> {
    match src.read(&mut buf) {
        Ok(n) => {
            if n == 0 {
                Err(err!("sock closed"))
            } else {
                try_write(buf, &mut target)
            }
        }
        Err(e) if e.kind() == ErrorKind::WouldBlock => Ok(()),
        err @ _ => err.map(|_| ()),
    }
}
fn try_write(buf: &[u8], sock: &mut TcpStream) -> io::Result<()> {
    loop {
        match sock.write(buf) {
            Ok(_) => return Ok(()),
            Err(e) if e.kind() == ErrorKind::WouldBlock => {
                thread::sleep(Duration::from_millis(10));
                continue;
            }
            err @ _ => return err.map(|_| ()),
        }
    }
}

I went with the approach above because I could not find any way to split the socket into halves so I can pass them around individually and I dont want to pass a ReadWrite able clone obtained from try_clone around to prevent possible writes or reads from wrong places

PS. Apparently on non_blocking mode the socket gets closed for some reason

You need: tokio::net::TcpStream::split()

Splits a TcpStream into a read half and a write half,…

There is a whole family of stream splitters:

I have used this to great effect, as described in your other thread, in a sort of proxy like application.

the TcpStream I couldnt split is the std::net::TcpStream not the tokio one. As I mentioned I already lost hope on async one and abandoned that codebase

Hmm.. Having never done it I cannot be sure but can't one clone the socket with try_clone()
and pass the two clones to different threads?

I notice that std::net::TcpStream implements Send and Sync so I have a feeling this should work.

Under the covers the OS should sort out the mutual exclusion the the socket access by two threads.

I would not be so quick to abandon the async approach, I found tokio makes many things like this quite easy.

Much nicer than messing around with select() and the like when you want to do all this with minimal tread usage.

well it is what I am doing at this point but if I was to read from the socket in another thread by mistake that would mess the data up in the reader thread and would be exteremely difficult to debug

If you are using the std TcpStream, you could create your own wrapper structs for the read and write half respectively and make sure that each struct only allows one kind of access.