Conditional regex replacement

I would like to replace strings in a template conditionally. However, the following naive approach fails due to lifetime issues. What's the best way to go (without doing an unnecessary extra allocation)?

use regex::{Captures, Regex};

fn main() {
    let s1 = "Hello World!";
    let s2 = Regex::new("World").unwrap().replace_all(s1, "Universe");
    assert_eq!(s2, "Hello Universe!");
    let s3 = Regex::new("World").unwrap().replace_all(s1, |caps: &Captures| {
        let universal: bool = false; // actually some more complex computation
        if universal {
            "Universe"
        } else {
            &caps[0] // don't replace
        }
    });
    assert_eq!(s3, "Hello Universe!");
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
error: lifetime may not live long enough
  --> src/main.rs:12:13
   |
7  |     let s3 = Regex::new("World").unwrap().replace_all(s1, |caps: &Captures| {
   |                                                                  -        - return type of closure is &'2 str
   |                                                                  |
   |                                                                  let's call the lifetime of this reference `'1`
...
12 |             &caps[0] // don't replace
   |             ^^^^^^^^ returning this value requires that `'1` must outlive `'2`

error: lifetime may not live long enough
  --> src/main.rs:12:13
   |
7  |     let s3 = Regex::new("World").unwrap().replace_all(s1, |caps: &Captures| {
   |                                                            ----           - return type of closure is &'2 str
   |                                                            |
   |                                                            has type `&regex::Captures<'3>`
...
12 |             &caps[0] // don't replace
   |             ^^^^^^^^ returning this value requires that `'3` must outlive `'2`

error: could not compile `playground` (bin "playground") due to 2 previous errors

Seems your closure doesn't match the implementation Replacer in regex - Rust

// required as defined
impl<F, T> Replacer for F
where
    F: FnMut(&Captures<'_>) -> T, // T has no lifetime bound
    T: AsRef<str>,

// your closure is
FnMut(&'s Captures<'_>) -> &'s str
where T has a lifetime bound with the input

A reproduction: Rust Playground

But if T has no lifetime bound, why is it required to live longer than the argument passed to the closure? T doesn't have a 'static bound either, right?

The closure has to work for all input lifetimes, and for all such lifetimes, return the same type T. So it's impossible for T to capture an input lifetime.

(Still not sure if there's an alternative...)

3 Likes

The alternative is probably implement Replacer yourself.

1 Like

I don't know. It's just required Closure outlives error should mention the source of the requirement · Issue #73144 · rust-lang/rust · GitHub

In general, returning a reference derived from an argument is perfectly fine. The issue is that the generic function like that requires that the return type outlive the argument - however, this is not mentioned anywhere in the error message.

A simpler repro from the linked issue: Rust Playground

Note: the error differs between closures and fn items, so if you rewite it as fn item, you'll see a more clear (:question: ) error msg saying the lifetime requirement is introduced here

Update: the error is the same after fixing the higher order closure, and refer to the great answer provided by @quinedot below

1 Like

I tried, and ended up using the nightly closure_lifetime_binder feature:

#![feature(closure_lifetime_binder)]

struct MyReplacer<F>(F);

impl<F> Replacer for MyReplacer<F>
where
    F: for<'a> FnMut(&'a Captures<'a>) -> Cow<'a, str>,
{
    fn replace_append<'b>(&mut self, caps: &'b Captures<'b>, dst: &mut String) {
        dst.push_str((self.0)(caps).borrow())
    }
}

fn main() {
    let s1 = "Hello World!";
    let s2 = Regex::new("World").unwrap().replace_all(s1, "Universe");
    assert_eq!(s2, "Hello Universe!");
    let s3 = Regex::new("World").unwrap().replace_all(
        s1,
        MyReplacer(
            for<'a> |caps: &'a Captures<'a>| -> Cow<'a, str> {
                let universal: bool = false; // actually some more complex computation
                if universal {
                    Cow::Borrowed("Universe")
                } else {
                    Cow::Borrowed(&caps[0]) // don't replace
                }
            }
        )
    );
    assert_eq!(s3, "Hello World!");
}

(Playground)

Any easier way? :sweat_smile:


Update:

I guess this could work on stable Rust:

struct MaybeReplace<F>(pub F);

impl<F, T> Replacer for MaybeReplace<F>
where
    F: FnMut(&Captures<'_>) -> Option<T>,
    T: AsRef<str>,
{
    fn replace_append(&mut self, caps: &Captures<'_>, dst: &mut String) {
        match (self.0)(caps) {
            None => {
                dst.push_str(&caps[0]);
            }
            Some(replacement) => {
                dst.push_str(replacement.as_ref());
            }
        }
    }
}

(Playground)

But is that idiomatic? Or maybe something like this exists already?

1 Like

These error messages are actually pretty confusing to me, e.g.:

   = note: expected reference `&Inner`
              found reference `&Inner`

when executing the Playground. :see_no_evil:

Don't look at that. See the following lines instead :laughing: . It points out the root of the problem

note: the lifetime requirement is introduced here
  --> src/main.rs:6:35
   |
6  | fn use_it<R, F: FnOnce(&Outer) -> R>(_val: F) {}
   |                                   ^

Well, not sure how "the lifetime requirement is introduced here" is more clear, when it refers to a type R that has no lifetime parameter at all.

What I find more clear is @quinedot's explanation:

But I don't see that reflected in the error message(s) (yet).

Update: it's not about strict outliveness.

I was thinking about the strict outliveness again: the pattern in OP is that for FnMut(&T) -> R, R should strictly outlive &T for any lifetime on &T.

But there is a subtlety: how come generic function like ["1", "2"].iter().map(|x| *x).map(|x: &str| x) works? Note that Iterator::map doesn't require the generic return type on F to strictly outlive the argument. The pattern now becomes for FnMut(T) -> R, strict outliveness doesn't apply. I.e

fn use_it<R, F: FnOnce(&Outer) -> R>(_val: F) {}
use_it(|outer| &outer.field); // error: lifetime may not live long enough

fn use_it<T, R, F: FnOnce(T) -> R>(_val: F) {}
use_it(|outer: &Outer| &outer.field); // works

Update: to conclude IMO

  • FnMut(&T) -> R is a giving pattern
  • FnMut(T) -> R can be an either giving or lending pattern

I think you might be over-complicating things here by trying to write generic implementation using closures. Try this instead:

use regex::{Captures, Regex};

struct MyReplace;

impl regex::Replacer for MyReplace {
    fn replace_append(&mut self, caps: &Captures<'_>, dst: &mut String) {
        let universal: bool = true;
        if universal {
            dst.push_str("Universe");
        } else {
            dst.push_str(&caps[0]);
        }
    }
}

fn main() {
    let s1 = "Hello World!";
    let s2 = Regex::new("World").unwrap().replace_all(s1, "Universe");
    assert_eq!(s2, "Hello Universe!");
    let s3 = Regex::new("World").unwrap().replace_all(s1, MyReplace);
    assert_eq!(s3, "Hello Universe!");
}

And if your logic is more complicated and needs more state, you can add that state to MyReplace instead of using a unit struct.

1 Like

My problem is that a function can't capture the environment. Real-world code:

#[derive(Debug)]
struct PreparedTemplate {
    template: String,
    variables: Vec<(String, SearchRange)>,
}

impl PreparedTemplate {
    fn new(template: &str) -> Self {
        let mut variables = Vec::new();
        let template = RE_RANGE.replace_all(&template, |caps: &Captures| {
            let var = caps["var"].to_string();
            let Ok(low): Result<f64, _> = caps["low"].parse() else {
                return caps[0].to_string(); // unnecessary alloc
            };
            let Ok(high): Result<f64, _> = caps["high"].parse() else {
                return caps[0].to_string(); // unnecessary alloc
            };
            variables.push((var, SearchRange::Finite { low, high }));
            String::new()
        });
        let template = template.to_string();
        Self {
            template,
            variables,
        }
    }
}

Of course, I could define MyReplace in such a way that it contains the variables Vec. But I wonder if this is really worth it. In practice, I might just do .to_string() and not worry about the extra allocation. But I would like to understand how I generally do a "conditional replacement" idiomatically. I guess the answer is: it depends?

That problem is exactly what I meant by "add them to the MyReplace struct." So something like:

struct MyReplace<'a> {
    variables: &'a mut Vec<(String, SearchRange)>,
}

Yes. If that works, then you should do that absolutely. But you established a requirement in your initial post that you not do that. I figured you had done some benchmarking to rule it out as an option.

The key here is that by using a Captures for this, you're already asking the regex engine to do a bunch of extra work---including an allocation for Captures---on your behalf. So cloning the String is probably not going to do much to your runtime.

I would say the idiomatic approach is to just call .to_string() and be done with it.

But if you need something to be as fast as possible and don't need capture groups, then I'd suggest just writing your own replacement routine. It's not that much code, and it's not that complex either given the existence of Regex::find_iter.

The Replacer trait tries to give you a way to write some common cases with a couple tricks for optimization (like Replacer::no_expansion). But it is by design not going to work for every use case mostly because I don't know how to design a replacement API that works well for all use cases. (This is a lot trickier than it sounds, because even if you think you know how to do it, how much more complicated have you made it for the simple use cases that cover 99% of what people need? Because if you've made that harder, then that a design I would reject. And at some point, you have to pop up a level and consider the complication of the API versus the code you're actually saving someone from writing. You could probably write a simple version of replace_all that isn't generic in about 5 minutes.)

4 Likes

(This post is about compile errors and the like, and not the practical issue at hand.)

I think there has been some confusion here. The outlives error is because Rust's notorious closure inference was getting in the way. Here we can see it has nothing to do with the trait bounds. Using coerce clues in the compiler that this is a perfectly valid (and embarrassingly simple) closure. But then trying to use it finally gets around to the point that the intended signature isn't compatible with the bound (though the error is still quite bad).

And here's an example that R doesn't have to outlive (any possible) &T to satisfy a FnMut(&T) -> R bound.

It's true that in the latter case, R could be borrowing from T, but I don't know that I'd call it a lending pattern per se, as neither the inputs nor outputs can't differ by lifetime, say. If there are lifetimes involved, they're fixed.

1 Like

Thanks! That really makes sense to me :heart:

Could you make the same comment in Closure outlives error should mention the source of the requirement · Issue #73144 · rust-lang/rust · GitHub where there are false error and explanation.

The last question becomes what does FnMut(&T) -> R mean on the input and output?
The one thing I can tell now is that FnMut(&T) -> R doesn't allow R to borrow from &T (R can borrow from T though) .

Update: Oh, the answer was right there

:white_check_mark:

1 Like

I wonder: Are Rust's closures not powerful enough to allow me to do what I like to do? (Assuming a different interface of regex.)

It seems like with a lot of effort, it's possible to let Rust do what I intended to do. We can demand that a closure returns a type that "depends" on a specific lifetime:

/// Type that "captures" a lifetime
pub trait CaptureLt {
    type WithLt<'a>;
}

fn use_it<R, F>(_val: F)
where
    R: CaptureLt,
    F: for<'a> FnOnce(&'a Outer) -> R::WithLt<'a>,
{}

However, type inference fails, so using this complex interface is like:

fn main() {
    // We need this to aid type inference in regard
    // to the return type of the closure or function:
    struct TyCon;
    impl CaptureLt for TyCon {
        type WithLt<'a> = &'a Inner;
    }
    use_it::<TyCon, _>(|outer: &Outer| &outer.field );
    use_it::<TyCon, _>(f);
}

(Playground)

Note that this works on stable and doesn't require #![feature(closure_lifetime_binder)].

I wonder if Rust's type system could be refined/replaced to avoid this sort of trouble. This also reminds me of the necessity to box certain futures, solely due to issues with lifetimes.

Anyway, these are just hypothetical thoughts. Back to the original problem:

Okay, understood, and that makes sense.

I only skimmed this, but I think you're basically talking about "I want a trait bound for closures that might or might not borrow their inputs". That's the more complicated version, this one would be

for<'any> F: FnOnce<(&'any Outer,)>,
for<'any> <F as FnOnce<(&'any Outer,)>::Output: AsRef<str>,

But you can't use FnOnce(&Outer) -> R for this because you're forced to name the return type and we don't have generic type construction parameters (or higher-kinded types or whatever). So then on stable you make your own trait with some blanket implementations etc... but probably run into heaps of inference issues and/or normalization issues.

So it's sometimes possible, but not always practical, and almost always at a cost.

:100: [1]


  1. but I can be suckered into writing the complicated version (or at least re-explaining it) when it comes up on this forum more often than not :sweat_smile: ↩︎

2 Likes