Bufreader with a TCPStream that is not supposed to close

Hi,

I'm (still) having a little bit of trouble understanding the BufReader. I know that functions like read_line and read_to_string read until an EOF. Yet when I do that, my program just deadlocks (basically it hangs).

I'm running a server TCP server, which sends an challenge string to incoming connections. These challenge strings end with '\n'.

What I've tried:

    let addr = "127.0.0.1:50000".parse::<SocketAddr>().unwrap();
    let mut stream = TcpStream::connect(addr).unwrap().take(248);

    let mut line = String::new();
    stream.read_to_string(&mut line);

    print!("{}", line);

I've also tried this:

    let addr = "127.0.0.1:50000".parse::<SocketAddr>().unwrap();
    let mut stream = TcpStream::connect(addr).unwrap();

    let mut reader = BufReader::new(stream);

    let mut line = vec![];
    &reader.read_until(b'\n', &mut line);

Yet that doesn't seem to work either. Can I somehow specify that it should read until an certain delimiter? I just want to manipulate the string that is being returned.

You should be using the read_line method to read until the end of a line.

On what exactly? :sweat_smile:
If I try something like:

     let addr = "127.0.0.1:50000".parse::<SocketAddr>().unwrap();
    let mut stream = TcpStream::connect(addr).unwrap();

    let mut reader = BufReader::new(stream);

    let mut line = String::new();
    reader.read_line(&mut line);

    print!("{}", line);

it deadlocks...

Are you sure that the line is actually being sent? Maybe you need to flush?

Flush? neither BufReader nor TcpStream implement some sort of a flush function?

I'm quite sure the string is being sent from the server. I tried something similair with C# and it works just fine.

I meant flush in the application sending the data. Can you post the C# code?

 TcpClient client = new("127.0.0.1", 50000);

            NetworkStream stream = client.GetStream();

            // Buffer to store the response bytes.
            Byte[] data = new Byte[256];

            // String to store the response ASCII representation.
            String responseData = String.Empty;

            // Read the first batch of the TcpServer response bytes.
            Int32 bytes = stream.Read(data, 0, data.Length);
            responseData = System.Text.Encoding.ASCII.GetString(data, 0, bytes);
            Console.WriteLine("Received: {0}", responseData);

Okay but the C# code isn't reading until a newline. Maybe the stuff that is sent doesn't end with a newline? You can try this, which does the same as C#

let addr = "127.0.0.1:50000".parse::<SocketAddr>().unwrap();
let mut stream = TcpStream::connect(addr).unwrap();
let mut reader = BufReader::new(stream);

let mut data = vec![0; 256];
let len = reader.read(&mut data).unwrap();
let data_str = std::str::from_utf8(&data[..len]).unwrap();
println!("{}", data_str);
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }', src/main.rs:14:54
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

appearantly it does not send valid UTF-8 back?

You can also print the raw bytes to inspect their contents. (note that I modified the example shortly after posting)

1 Like

I assume you mean:

println!("{:?}", data);

which does not yield something interesting:

[245, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

according to the documentation, it should send valid UTF-8 encoded messages, with a line feed EOL.

The client communicates with the server by sending and receiving UTF-8 encoded text messages of one or multiple lines. The EOL character is \n (line feed).

EDIT: don't know if this is useful, but some of the times it does fill a 126 byte buffer:

[247, 0, 74, 122, 114, 55, 80, 105, 112, 98, 69, 58, 109, 115, 101, 114, 118, 101, 114, 58, 57, 58, 73, 78, 86, 65, 76, 73, 68, 44, 82, 73, 80, 69, 77, 68, 49, 54, 48, 44, 83, 72, 65, 53, 49, 50, 44, 83, 72, 65, 51, 56, 52, 44, 83, 72, 65, 50, 53, 54, 44, 83, 72, 65, 50, 50, 52, 44, 83, 72, 65, 49, 44, 67, 79, 77, 80, 82, 69, 83, 83, 73, 79, 78, 95, 83, 78, 65, 80, 80, 89, 44, 67, 79, 77, 80, 82, 69, 83, 83, 73, 79, 78, 95, 76, 90, 52, 58, 76, 73, 84, 58, 83, 72, 65, 53, 49, 50, 58, 115, 113, 108, 61, 54, 58, 0]

Also, if instead of unwrapping the str::from_utf8 I look at the panic, it throws this:

thread 'main' panicked at 'invalid utf-8 sequence of 1 bytes from index 0', src/main.rs:17:19
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The first byte isn’t valid UTF8. Anything higher than 127 must be part of a multi-byte sequence, so 245 immediately followed by 0 is wrong.

Right, would it be any good for me if I just skip that bit?
Something like:

data.iter().skip(1).next().unwrap().collect::<Vec<u8>>()

(which is obviously wrong, but I don't know how to do it...)

Where is this documentation? This passage reads to me as an indicator that the data may contain EOLs, but not that they indicate the end of the message. In fact, they can’t: the message may contain multiple lines, so must be terminated some other way.

My gut instinct says that the first 2 bytes are probably something like a length field meant to tell the program how much data to read.

Mmm, that can be. Should I just ignore the first 2 bytes then? What's an elegant way to skip the first 2 bytes in a vec! macro?

EDIT:

These text messages are transferred in packets. The maximal size of a packet is 8190 bytes. It is not guaranteed that a packet contains a proper UTF-8 encoded text, because it's possible that a multi-byte character is cut in half at the end of its payload.

Every packet starts with a 16 bit (2 byte) integer, called header. The LSB of the header is only 1 for the last packet in the message, and 0 for all the others. You can get the number of bytes in the payload by shifting the header by 1 bit to the right (to remove the LSB).

It looks like you're correct.

The simplest way would be to read into a [u8;2] first, and then make a second read call for the vector data. Instead of ignoring them, though, it’s probably better to find the part of the documentation that describes what they mean.


Edit: Now that you’ve found the relevant documentation, I’d read the two-byte header, perform the shift they describe, and then read that many bytes of payload. Repeat until you get a packet that indicates the end of a message and concatenate everything together.

You can use u16::from_le_bytes to reconstruct the integer from a [u8;2].

It's untested, but I'd read these something like this:

pub fn read_message<R: Read>(mut reader:R)->Result<String, Box<dyn std::error::Error>> {
    let mut buf:Vec<u8> = vec![];
    let mut packet: Vec<u8> = vec![];
    loop {
        let mut header_bytes = [0u8;2];
        reader.read_exact(&mut header_bytes)?;
        let header = u16::from_le_bytes(header_bytes);
        packet.resize((header >> 1) as usize, 0);
        reader.read_exact(packet.as_mut_slice())?;
        buf.append(&mut packet);
        if header & 1 == 1 {
            return Ok(String::from_utf8(buf)?);
        }
    }
}

Edit: Reworked a bit to not reallocate the packet buffer every time through the loop

Haha, I guess it probably works like that, but I've done it a lil bit (less elegant than you) different.

I did it like this:

    let addr = "127.0.0.1:50000".parse::<SocketAddr>().unwrap();
    let mut stream = TcpStream::connect(addr).unwrap();
    let mut reader = BufReader::new(stream);

    let mut header = vec![0; 1];
    reader.read(&mut header).unwrap();
    let mut is_last = header[0] & 1;
    let byte_count = header[0] >> 1;
    println!("{:?}", is_last);
    println!("{:?}", byte_count);


    let mut data = vec![0; byte_count as usize];

    reader.read(&mut data).unwrap();
    
    let data_str = std::str::from_utf8(&data).unwrap();
    println!("{:?}", data_str);

and it works!

Thanks a ton, all of you! :smiley:

1 Like

You need to be careful with using the read method. It only guarantees that at least one byte is read. If you wish to make sure it actually reads the full amount of bytes, use read_exact.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.