How to extend std's enum?

I think part of the purpose of using enum is to make it easier to extend. I'm curious about how to extend the standard library's enum type so that I can choose some more efficient algorithm to execute. For example for StrSearcherImpl type in std src code, it only uses Two-Way algorithm. If I want to add an item called KMP so that a call to "abc".find("ab") can directly call KMP instead of Two-Way, how can I do this exactly? I can't figure out a way to extend std like this directly by myself.
This enum definition is as follows:

enum StrSearcherImpl {
    Empty(EmptyNeedle),
    TwoWay(TwoWaySearcher),
}

If it's private or annotated with #[non_exhaustive], it's reasonable to extend an enum. (The same is true of structs, but extensions are more fields instead of more variants.)

You can't modify foreign types. You can't even use private foreign types like StrSearcherImpl. You can't change foreign implementations either.

What you described could maybe be a PR for std, but not something you can do as a dependent crate.

I think this means programmers are not that free using Rust. I previously thought that I could do anything with Rust with its modularity and exstensibility.

Very few production languages allow you to arbitrarily change the code of dependencies, and for good reason. Rust is not unique in this regard.

5 Likes

on the other hand, how often do programmers need to change implementation detail of one of core behaviors of a language?
that's why i love Rust - for stealing some of my freedoms of breaking things i have had in C for 15+ years of fairly frequent usage :slight_smile:

edit: i have misinterpreted your target goal, so wording i used above is misleading, sorry for that, though language -> core lib still imho applies in a way...

1 Like

besides what was already said, there's another technicality here.

the KMP algorithm needs to construct a LPS table. if the implementation choose to allocate memory internally, then it cannot be in core; if the implementation requires the caller to pass a scratch buffer, then is not compatible with the interface of str::find().

anyway, rust's core library is minimal but practically useful, it does NOT aim to be THE "battery included", all-purpose library.

if you have an idea to improve the standard library, there's a procedure to contribute. you are not limited to the the core library if some function (str::find() in this case) does not meet your special requirements, you can always use a crate (e.g. kmpm), or roll your own algorithm.

EDIT: kmpm's implementation is not of high quality. please check other crates.

7 Likes

No, it’s the exact opposite. By design enums are not extensible by users because that’s what allows the very important property of exhaustive matching – being certain that you have covered all cases without needing a fallback case. Generics and runtime polymorphism (dyn types) are meant for extensible choice.

4 Likes

This is very good supplementation here. Previously I could only think of the difficulties of some direct details related to this kind of extension mentioned by my top post.

although kmp is not suitable for str::find() in core, if you want to implement KMP as a method of the str type, you can just use the extension trait pattern, it might look like this:

// this is just an example, I decided not to name the method `find()`,
// to avoid conflicting with the inherent method.
pub trait KMP {
    /// find first occurence of `pattern` in `self` using KMP algorithm,
    /// this shall not do allocation internally, use the provided `lps` array instead
    /// `lps` must be large enough for KMP, otherwise will panic
    fn find_kmp_with_lps(&self, pattern: &str, lps: &mut [usize]) -> Option<usize>;
    /// same as `self.find_kmp_with_lps()`, except heap allocating `lps` using `Vec`
    fn find_kmp(&self, pattern: &str) -> Option<usize> {
        self.find_kmp_with_lps(pattern, &mut vec![0; pattern.len()])
    }
}

impl KMP for str {
    fn find_kmp_with_lps(&self, pattern: &str, lps: &mut [usize]) -> Option<usize> {
        // implementation goes here
        todo!()
    }
}

fn main() {
    // just normal method call syntax on `&str`, similar to `find()`
    let idx = "haystack".find("needle");
    let idx = "haystack".find_kmp("needle");
}
4 Likes

We do have a 3rd impl for contains with small needles which uses simd on x86_64 and doesn't need to allocate. That one could be extended to extract the position and be used for find, at least if the added complexity doesn't make the performance worse than two-way.

3 Likes

I advise against this particular crate. The implementation is rather low quality and does not have the same complexity as the algorithm because it uses chars().nth(...) in the inner loop.

2 Likes