Sub-slice detection optimization

I have a need to check if a sub-slice exists in a bigger slice.

This is what I currently have:

/// Determine whether a slice contains a sub-slice.
fn contains(source: &[Bytes], find: &[Bytes]) -> bool {
    let length = find.len();
    source.par_windows(length).any(|window| window == find)

Originally this was a check on substrings (&str) and I could just: source.contains(find). I am now using slices of Bytes because I've compressed these strings and this is how they are now represented.

The source.contains(find) way of doing this using &str was way more performant. And this has become a major bottleneck in my program.

Is there any guidance on how I might be able to speed this up ? I was thinking possibly checking from both ends, but I'm not certain how I would do that if it would be near as performant as when I was doing source.contains(find).

Well, a very efficient algorithm for this is the KMP Algorithm.

1 Like


Thank you very much! I will try implementing this. And see how that goes.

I guess your Bytes are u8? Check out the memchr crate.

The aho_corasick crate also exists. It handles multiple search strings, but the single string case is the same concept as KMP.


This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.