After doing a benchmark test using flamegraph I can see that most of the time is spent in Split Iterator next method (45% of execution time) and inside that method most part is gone in TwoWaySearcher (35% of execution time).
Using the crate bstr I can get better results, as I can work with bytes directly without casting to str, but still around 45% of execution time is gone in the split_str function.
Also I can appreciate that a significant portion of time of both split functions is gone in swapgs_restore_regs_and_return_to_usermode function.
Do you know what can make string splitting so slow? Should I check a different crate for a faster string splitting?
Besides something silly like accidentally making your algorithm O(n2) and needing to split input more than once, could it just be that splitting takes the most time because your program is quite simple and that's where all the processing is spent?
You might also find that writing your own splitting routine with something like the memchr crate will help because you can make assumptions about your input that a general-purpose crate like the standard library can't.
TwoWaySearcher should only be needed for patterns longer than one byte, not for finding a comma. If you're searching for a one byte long pattern, and it's a release build, I would expect the iterator to be inlined until it won't even show up in the flamegraph.