Ergonomic Iteration of a String + Subsections of that String

I'm looking for an ergonomic way to generate an Iterator<Item = &str> which will break early when a certain condition is met, so I can iterate over values without duplicating my logic. Two examples for an actual use-case exist that I can think of:

  1. File-system paths, e.g. /var/lib/something.txt should yield:
    • /var/lib/something.txt
    • /var/lib
    • /var
  2. Rust module paths, e.g. a::b::c should yield:
    • a::b::c
    • a::b
    • a

I essentially have a HashMap<String, Rule> that I need to check in a depth-first fashion for a string key. Here is the code that I have that works, but is not ergonomic and duplicates logic:

pub struct Rule(usize); // for ease

fn find_most_specific_rule(actual: &str, rules: &HashMap<String, Rule>) -> Option<&Rule> {
    // FIXME how can we get rid of this extra block of code ergonomically?
    if let Some(rule) = rules.get(actual) {
        return Some(rule);
    }

    let mut actual = actual;

    while let Some((base, _)) = actual.rsplit_once("::") {
        if let Some(rule) = rules.get(base) {
            return Some(rule);
        }

        // FIXME this is unideal as well
        actual = base;
    }

    None
}

Here are some tests that should hopefully further elucidate what the intended outcome should be:

#[test]
fn test_rule_finding() {
    // build rule set
    let m: HashMap<String, Rule> =  {
        let mut m = HashMap::new();
    
        // affect all children of a::b::c
        m.insert("a::b::c".into(), Rule(3));
        // affect all children of a::b (unless a more specific rule is found)
        m.insert("a::b".into(), Rule(2));
        // affect all children of a (unless a more specific rule is found)
        m.insert("a".into(), Rule(1));

        m
    }; 

    // all children of a::b::c should receive rule 3
    assert_eq!(Some(Rule(3)), find_most_specific_rule("a::b::c::d", &m));
    assert_eq!(Some(Rule(3)), find_most_specific_rule("a::b::c::different", &m));

    // all children of a::b should receive rule 2 unless a more specific rule is matched
    assert_eq!(Some(Rule(2)), find_most_specific_rule("a::b::not_c", &m));
    assert_eq!(Some(Rule(2)), find_most_specific_rule("a::b::another_not_c", &m));

    // all children of a should receive rule 1 unless a more specific rule is matched
    assert_eq!(Some(Rule(1)), find_most_specific_rule("a::not_b", &m));
    assert_eq!(Some(Rule(1)), find_most_specific_rule("a::another_not_b", &m));
}

In my implementation code, there are a few things (as noted) that I'd really like to get rid of:

  1. I have a completely separate if statement at the beginning of the function which largely duplicates the logic of the while loop.
  2. I'm using a mutable string reference that I have to update at the end of each loop iteration, and this also seems error-prone.

What would be most ideal would be to create an iterator which yields the string itself as the first value, and then progressively uses str::rsplit_once to cut the string down and walk up the module path.

What iterator methods am I looking for here? What would be a more ergonomic way of approaching this?

I can imagine something like:

fn reverse_prefixes(str: &str) -> impl Iterator<Item = &str> {
    core::iter::successors(
        Some(str),
        |s| s.rsplit_once("::").map(|(head, _tail)| head)
    )
}

which in turn simplifies your function to

fn find_most_specific_rule<'a>(actual: &str, rules: &'a HashMap<String, Rule>) -> Option<&'a Rule> {
    reverse_prefixes(actual).find_map(|suffix| rules.get(suffix))
}

Playground.

1 Like

Incidentally one of your tests doesn't pass with your OP code.

The bug:

     while let Some((base, _)) = actual.rsplit_once("::") {
-        if let Some(rule) = rules.get(actual) {
+        if let Some(rule) = rules.get(base) {
             return Some(rule);
         }
1 Like

On nightly there is the remainder method RSplit in std::str - Rust. On stable, I would write your code as

loop {
    if let Some(rule) = rules.get(current) {
        return Some(rule);
    }
    (current, _) = current.rsplit_once(delimiter)?;
}

Thanks for pointing that out, I fixed the code above. I just wrote this code into the browser, didn't copy paste it, so didn't have the compiler check it.