I am trying to write a simple TCP program to help my knowledge of rust. But have run up against something I can't seem to find an answer to, although it should be simple.
After I make a connection to the TcpStream handler, I have this line of code:
stream.read(&mut buffer).unwrap();
So I have now read into the buffer array [u8;1024] but I want to convert this into a string, so I can do a comparison.
How so I convert it into a string? I tried the .to_string() but it isn't allowed because of traits.
String::from_utf8_lossy will make a copy of the string with any invalid bytes replaced, if necessary.[1]
std::str::from_utf8_unchecked has no checks, but is unsafe and will result in undefined behaviour if you pass anything that's not valid UTF-8.
Note that the function returns Cow, which allows it to return either the original input or a new copy. This means that, on the happy path where the data is valid, you don't allocate a new string. âŠī¸
There is the possibility that a group of UTF-8 characters could straddle two consecutive byte buffers. Either your codepoint reader has to find these possible straddlers and relocate them to the beginning of the next buffer, or to save them up and process them in combination with the beginning of the next buffer. More complicated than just calling from_utf8_lossy.
I think BufReader will hide incomplete UTF-8 fragments from a client caller, which is another valid approach.
There is a crate called 'nom' that can be used to write a custom parser.