I'm working on socket packets parser, where some data separated by \r\n bytes.
Doubts to use windows for this task,
maybe it's better to iterate byte-by-byte and collect data by contains or starts_with function for performance reasons?
Shortly, my input request looks like that:
header\r\nsome data 1\r\nsome data 2
In PHP, I've simply used explode function, now thinking about fast replacement in Rust
str::lines() is what your looking for. It handles both \n and \r\n.
fn main() {
let s = "header\r\nsome data 1\r\nsome data 2";
let lines: Vec<_> = s.lines().collect();
assert_eq!(lines, &["header", "some data 1", "some data 2"]);
}
lines is defined on str, and has no equivalent on [u8]. However, the individual components used by str::lines do.
fn byte_lines(bytes: &[u8]) -> impl Iterator<Item = &[u8]> {
bytes.split_inclusive(|&b| b == b'\n')
.map(|line| {
let Some(line) = line.strip_suffix(b"\n") else { return line };
let Some(line) = line.strip_suffix(b"\r") else { return line };
line
})
}
Edit: I was too focused on following str::lines exactly, so I missed that the equivalent code has been already given in this thread
This code maintains the same behavior as str::lines, namely also splitting on \n.
If you need it to pass through lone newline bytes (not paired with a \r)[1] then you could have a [_]::split closure maintain state of the last byte seen and split on last = \r and cur = \n. (Then strip out the \r from before the split was confirmed)
fn byte_split_crlf(bytes: &[u8]) -> impl Iterator<Item = &[u8]> {
let mut last = b'\0';
bytes
.split(move |¤t| {
if last == b'\r' && current == b'\n' {
return true;
}
last = current;
false
})
// There is no \r suffix with the final segment
// because the data is not necessarily terminated by \r\n
.map(|segment| segment.strip_suffix(b"\r").unwrap_or(segment))
}