Seems impossible to use read_exact for socks5 domain name

There's an instruction of socks5 protocol The authentication uses lengths for username and password, whereas DST.ADDR, the domain name, is variable and socks5 client doesn't send the length. Seems impossible to use .read_exact() method for DST.ADDR, the .read() method may read less bytes than DST.ADDR.

1 Like

Do you have a link in English?

You always know the length of DST.ADDR in the SOCKS5 protocol. If it's type 1 (IPv4 literal), then it's 4 octets. If it's type 4 (IPv6 literal), it's 16 octets. And if it's type 3 (FQDN), the first octet of DST.ADDR is the length of the FQDN in octets, followed by the octets that make up the FQDN.

So something a bit like:

let atyp = sock.read_byte()?;
match atyp {
    1 => Socks5Addr::ipv4_from_bytes(sock.read_exact(4)?),
    4 => Socks5Addr::ipv6_from_bytes(sock.read_exact(16)?),
    3 => {
        let fqdn_len = sock.read_byte()?;
        let fqdn = sock.read_exact(fqdn_len)?;
        Socks5Addr::fqdn_from_bytes(fqdn)
     }
}

Noting that this is pseudo-Rust.

1 Like

Did it have type 2 for DST.ADDR?

I can't find a type 2 in the RFC (which defines the SOCKS5 protocol) or through a Google search. Your guess is as good as mine as to what it might have been used for before the protocol was formalized in RFC 1928

I used this code:

use std::{io::*, net::*, str::*};
fn main() {
    let mut s = TcpStream::connect("127.0.0.1:2000").unwrap();
    let mut a = [0u8; 65535];
    s.write_all(&[5, 1, 0]).unwrap();
    s.read_exact(&mut a[..2]).unwrap();
    s.write_all(&[5, 1, 0, 3]).unwrap();
    let w = "bing.com";
    s.write_all(&[w.len() as u8]).unwrap();
    s.write_all(w.as_bytes()).unwrap();
    s.write_all(&443u16.to_be_bytes()).unwrap();
    s.read_exact(&mut a[..4]).unwrap();
    println!("{:?}", &a[..4]);
    if a[3] == 1 {
        s.read_exact(&mut a[..6]).unwrap();
        println!(
            "1\n{:?}\n{}",
            &a[..4],
            u16::from_be_bytes(unsafe { *(a[4..6].as_ptr() as *const [u8; 2]) })
        )
    } else if a[3] == 3 {
        s.read_exact(&mut a[..1]).unwrap();
        let l = a[0] as usize;
        s.read_exact(&mut a[..l + 2]).unwrap();
        println!(
            "3\n{}\n{}",
            unsafe { from_utf8_unchecked(&a[..l]) },
            u16::from_be_bytes(unsafe { *(a[l..l + 2].as_ptr() as *const [u8; 2]) })
        )
    } else if a[3] == 4 {
        s.read_exact(&mut a[..18]).unwrap();
        println!(
            "4\n{:?}\n{}",
            &a[..16],
            u16::from_be_bytes(unsafe { *(a[16..18].as_ptr() as *const [u8; 2]) })
        )
    }
}

//The socks proxy listens on 127.0.0.1:2000, so it connects to that address. The terminal prints:
1
[127, 0, 0, 1]
2000
Does the proxy always answer the listening address?

If you want someone to look at your code, then formatting it nicely would give you better chances.

2 Likes

See RFC 1928 - SOCKS Protocol Version 5 - the proxy responds with the address and port you should connect to to make the onwards connection.

That said, you're not checking the reply status field, so you could be getting garbage - if the REP field is 0, then the proxy hasn't given you an address to connect to.

And the question about compile-time optimization:

fn main() {
    let s = String::from("127.0.0.1:1080");
    let s = std::net::TcpStream::connect(&s).unwrap();
    //The String is shadowed
}

The String is shadowed, does the compiler free the memory immediately after it's shadowed? Or will the program free the memory only if it goes out of scope?

String has non-trivial Drop (std::mem::needs_drop<String>() returns true), and as such it will not be freed until the end of the scope. You could rewrite your code as:

fn main() {
    let s = {
        let s = String::from("127.0.0.1:1080");
        std::net::TcpStream::connect(&s).unwrap();
    };
    // String scope has ended, so it's dropped, TcpStream still in scope
}

And this would get you the early drop of the String.

1 Like

Location of destructors is not subject to that kind of logic. Destructors always run at the end of the scope.

2 Likes

There's unsafe in the code above, although there's TryFrom and TryInto traits. But these traits return Result and need to be unwrapped, can the compiler optimize out Results from TryFrom and TryInto if the conversion is obviously successful?

The best way to answer that question is to use something like cargo show-asm or Godbolt to look at assembly output and see if Rust has elided the error check completely.

The compiler can elide anything that it can prove is always true; if it can't prove that it's always true, then using the unsafe variants pushes responsibility for the error check onto you. Note (as an example) that domain names in SOCKS5 do not have to be UTF-8, so your use of from_utf8_unchecked is dangerous, since non-UTF8 names will cause you problems.

But, for example, this implementation of slice to u16 has the error check elided by the compiler - you can tell, since I've used a specific message in the expect call, and the compiler has removed the message, so it must have removed the expect call as well. If I deliberately break it by giving it 1 byte, not 2, the message is in the output, showing that the compiler doesn't normally elide it.

2 Likes

DATA in UDP association is variable, how to know the length?

It's the last field in the datagram, so it's all the remaining data in the datagram after the SOCKS5 UDP header.