Lightweight replacement for PHP explode function

I'm working on socket packets parser, where some data separated by \r\n bytes.

Doubts to use windows for this task,
maybe it's better to iterate byte-by-byte and collect data by contains or starts_with function for performance reasons?

Shortly, my input request looks like that:

header\r\nsome data 1\r\nsome data 2

In PHP, I've simply used explode function, now thinking about fast replacement in Rust

Thanks for any ideas!

Is split helpful?

1 Like

Just FYI if you come from php to other languages, the (string) functions in php are named (exceptionally) differently from most other languages.

split, explode
join, implode

Etc.

1 Like

if you want to split over all newline seperator including those weird \n\r windows use i saw someone with the same problem at:
https://www.reddit.com/r/rust/comments/11senta/how_to_split_a_byte_array_by_new_line_characters/

basically it split using '\n' and in a case where \r follow it get it off by stripping the start of the newline on char '\r'.

let lines = data
    .split(|&b| b == b'\n')
    .map(|line| line
        .strip_suffix(b"\r")
        .unwrap_or(line)
    );

In his case he use raw char b'\n' because he's dealing with &[u8] be if you are with string just normal char '\n'

The other things to simplify your job can be:

let string_with_sep = "some nice\n\rtext";
    let string_without_sep = string_with_sep.replace('\r', "");
    for line in string_without_sep.lines() {}

In this way you get ride of all the \r char

1 Like

str::lines() is what your looking for. It handles both \n and \r\n.

fn main() {
    let s = "header\r\nsome data 1\r\nsome data 2";
    
    let lines: Vec<_> = s.lines().collect();
    
    assert_eq!(lines, &["header", "some data 1", "some data 2"]);
}
4 Likes

Wow, so simple, thank you much and to others for alternative examples!

Yet not tried, hope it will work with bytes array, to not make extra conversions

lines is defined on str, and has no equivalent on [u8]. However, the individual components used by str::lines do.

fn byte_lines(bytes: &[u8]) -> impl Iterator<Item = &[u8]> {
    bytes.split_inclusive(|&b| b == b'\n')
        .map(|line| {
            let Some(line) = line.strip_suffix(b"\n") else { return line };
            let Some(line) = line.strip_suffix(b"\r") else { return line };
            line
        })
}

Edit: I was too focused on following str::lines exactly, so I missed that the equivalent code has been already given in this thread

This code maintains the same behavior as str::lines, namely also splitting on \n.
If you need it to pass through lone newline bytes (not paired with a \r)[1] then you could have a [_]::split closure maintain state of the last byte seen and split on last = \r and cur = \n. (Then strip out the \r from before the split was confirmed)


  1. which would be required in HTTP, for instance ↩︎

1 Like

Which could be done like this:

fn byte_split_crlf(bytes: &[u8]) -> impl Iterator<Item = &[u8]> {
    let mut last = b'\0';
    bytes
        .split(move |&current| {
            if last == b'\r' && current == b'\n' {
                return true;
            }
            last = current;
            false
        })
        // There is no \r suffix with the final segment
        // because the data is not necessarily terminated by \r\n
        .map(|segment| segment.strip_suffix(b"\r").unwrap_or(segment))
}

  1. which would be required in HTTP, for instance ↩︎

1 Like