Sub-slice detection optimization

I have a need to check if a sub-slice exists in a bigger slice.

This is what I currently have:

/// Determine whether a slice contains a sub-slice.
#[inline]
fn contains(source: &[Bytes], find: &[Bytes]) -> bool {
    let length = find.len();
    source.par_windows(length).any(|window| window == find)
}

Originally this was a check on substrings (&str) and I could just: source.contains(find). I am now using slices of Bytes because I've compressed these strings and this is how they are now represented.

The source.contains(find) way of doing this using &str was way more performant. And this has become a major bottleneck in my program.

Is there any guidance on how I might be able to speed this up ? I was thinking possibly checking from both ends, but I'm not certain how I would do that if it would be near as performant as when I was doing source.contains(find).

Well, a very efficient algorithm for this is the KMP Algorithm.

1 Like

Great,

Thank you very much! I will try implementing this. And see how that goes.

I guess your Bytes are u8? Check out the memchr crate.

The aho_corasick crate also exists. It handles multiple search strings, but the single string case is the same concept as KMP.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.