Ergonomic Iteration of a String + Subsections of that String

I'm looking for an ergonomic way to generate an Iterator<Item = &str> which will break early when a certain condition is met, so I can iterate over values without duplicating my logic. Two examples for an actual use-case exist that I can think of:

  1. File-system paths, e.g. /var/lib/something.txt should yield:
    • /var/lib/something.txt
    • /var/lib
    • /var
  2. Rust module paths, e.g. a::b::c should yield:
    • a::b::c
    • a::b
    • a

I essentially have a HashMap<String, Rule> that I need to check in a depth-first fashion for a string key. Here is the code that I have that works, but is not ergonomic and duplicates logic:

pub struct Rule(usize); // for ease

fn find_most_specific_rule(actual: &str, rules: &HashMap<String, Rule>) -> Option<&Rule> {
    // FIXME how can we get rid of this extra block of code ergonomically?
    if let Some(rule) = rules.get(actual) {
        return Some(rule);
    }

    let mut actual = actual;

    while let Some((base, _)) = actual.rsplit_once("::") {
        if let Some(rule) = rules.get(base) {
            return Some(rule);
        }

        // FIXME this is unideal as well
        actual = base;
    }

    None
}

Here are some tests that should hopefully further elucidate what the intended outcome should be:

#[test]
fn test_rule_finding() {
    // build rule set
    let m: HashMap<String, Rule> =  {
        let mut m = HashMap::new();
    
        // affect all children of a::b::c
        m.insert("a::b::c".into(), Rule(3));
        // affect all children of a::b (unless a more specific rule is found)
        m.insert("a::b".into(), Rule(2));
        // affect all children of a (unless a more specific rule is found)
        m.insert("a".into(), Rule(1));

        m
    }; 

    // all children of a::b::c should receive rule 3
    assert_eq!(Some(Rule(3)), find_most_specific_rule("a::b::c::d", &m));
    assert_eq!(Some(Rule(3)), find_most_specific_rule("a::b::c::different", &m));

    // all children of a::b should receive rule 2 unless a more specific rule is matched
    assert_eq!(Some(Rule(2)), find_most_specific_rule("a::b::not_c", &m));
    assert_eq!(Some(Rule(2)), find_most_specific_rule("a::b::another_not_c", &m));

    // all children of a should receive rule 1 unless a more specific rule is matched
    assert_eq!(Some(Rule(1)), find_most_specific_rule("a::not_b", &m));
    assert_eq!(Some(Rule(1)), find_most_specific_rule("a::another_not_b", &m));
}

In my implementation code, there are a few things (as noted) that I'd really like to get rid of:

  1. I have a completely separate if statement at the beginning of the function which largely duplicates the logic of the while loop.
  2. I'm using a mutable string reference that I have to update at the end of each loop iteration, and this also seems error-prone.

What would be most ideal would be to create an iterator which yields the string itself as the first value, and then progressively uses str::rsplit_once to cut the string down and walk up the module path.

What iterator methods am I looking for here? What would be a more ergonomic way of approaching this?

I can imagine something like:

fn reverse_prefixes(str: &str) -> impl Iterator<Item = &str> {
    core::iter::successors(
        Some(str),
        |s| s.rsplit_once("::").map(|(head, _tail)| head)
    )
}

which in turn simplifies your function to

fn find_most_specific_rule<'a>(actual: &str, rules: &'a HashMap<String, Rule>) -> Option<&'a Rule> {
    reverse_prefixes(actual).find_map(|suffix| rules.get(suffix))
}

Playground.

1 Like

Incidentally one of your tests doesn't pass with your OP code.

The bug:

     while let Some((base, _)) = actual.rsplit_once("::") {
-        if let Some(rule) = rules.get(actual) {
+        if let Some(rule) = rules.get(base) {
             return Some(rule);
         }
1 Like

On nightly there is the remainder method RSplit in std::str - Rust. On stable, I would write your code as

loop {
    if let Some(rule) = rules.get(current) {
        return Some(rule);
    }
    (current, _) = current.rsplit_once(delimiter)?;
}

Thanks for pointing that out, I fixed the code above. I just wrote this code into the browser, didn't copy paste it, so didn't have the compiler check it.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.