Unless it's literally gigabytes, it's faster to simply read the whole content to a string and split that, but otherwise it's somewhat tough.
To implement this as a method at the moment requires manually implementing Iterator which is a pain. You'd probably want to wrap BufReader in std::io - Rust for it to handle buffering and peek into the buffer to look for the separator, but it would be a pain to keep all the state straight still.
The crate you're looking for seems to be:
But it's send unmaintained, so you might want to just use it as reference?
Since you just collect data until the delimiter is found, you can make use of BufRead to extend one blob at a time until the delimiter is found, as opposed to a fixed-size buffer that may get filled with half of the delimiter.
This will panic if a read happens to end in the middle of a UTF-8 codepoint, even if the input as a whole is valid UTF-8. Dealing with split code points is a similar challenge to dealing with split delimiters (albeit limited to a few bytes in length).
(In contrast the playground I posted defers UTF-8 checks until an entire delimited chunk has been read.)
Ha it's a really good algorithm that work perfectly for the case of my example. But my example is not representative of my exact need. A big thank to you , sorry to the misunderstood it's my fault.
In fact my input file looks more :
fn seed() {
std::fs::write(
FILE,
"-- FILE DELIMITER --\r\n
name: MyStream1.bigtext\r\n
Some massive text to treat.....
[...]
-- FILE DELIMITER --\r\n
name: MySTteam2.bigtext\r\n
Some massive text here too .....
[...]
",
)
.unwrap();
}
In this last case :
(1)- The size of the buffer have to be limited (because it's a multithreading app the use of memory have to be limited)
(2)- The delimiter have to not be in the returned buffer if possible (to not match it twice -> for performance issue)
(3)- And because of (2) we have to know if yes or no the Delimiter has been detected in the last readed buffer
In what I read mailparse doesn't return a stream on the body, it directly loads all in memory. But your trail make me have a look at multipart , which return the body part in a buffer. It contains the mechanism !