Conditional regex replacement

Update: it's not about strict outliveness.

I was thinking about the strict outliveness again: the pattern in OP is that for FnMut(&T) -> R, R should strictly outlive &T for any lifetime on &T.

But there is a subtlety: how come generic function like ["1", "2"].iter().map(|x| *x).map(|x: &str| x) works? Note that Iterator::map doesn't require the generic return type on F to strictly outlive the argument. The pattern now becomes for FnMut(T) -> R, strict outliveness doesn't apply. I.e

fn use_it<R, F: FnOnce(&Outer) -> R>(_val: F) {}
use_it(|outer| &outer.field); // error: lifetime may not live long enough

fn use_it<T, R, F: FnOnce(T) -> R>(_val: F) {}
use_it(|outer: &Outer| &outer.field); // works

Update: to conclude IMO

  • FnMut(&T) -> R is a giving pattern
  • FnMut(T) -> R can be an either giving or lending pattern

I think you might be over-complicating things here by trying to write generic implementation using closures. Try this instead:

use regex::{Captures, Regex};

struct MyReplace;

impl regex::Replacer for MyReplace {
    fn replace_append(&mut self, caps: &Captures<'_>, dst: &mut String) {
        let universal: bool = true;
        if universal {
            dst.push_str("Universe");
        } else {
            dst.push_str(&caps[0]);
        }
    }
}

fn main() {
    let s1 = "Hello World!";
    let s2 = Regex::new("World").unwrap().replace_all(s1, "Universe");
    assert_eq!(s2, "Hello Universe!");
    let s3 = Regex::new("World").unwrap().replace_all(s1, MyReplace);
    assert_eq!(s3, "Hello Universe!");
}

And if your logic is more complicated and needs more state, you can add that state to MyReplace instead of using a unit struct.

1 Like

My problem is that a function can't capture the environment. Real-world code:

#[derive(Debug)]
struct PreparedTemplate {
    template: String,
    variables: Vec<(String, SearchRange)>,
}

impl PreparedTemplate {
    fn new(template: &str) -> Self {
        let mut variables = Vec::new();
        let template = RE_RANGE.replace_all(&template, |caps: &Captures| {
            let var = caps["var"].to_string();
            let Ok(low): Result<f64, _> = caps["low"].parse() else {
                return caps[0].to_string(); // unnecessary alloc
            };
            let Ok(high): Result<f64, _> = caps["high"].parse() else {
                return caps[0].to_string(); // unnecessary alloc
            };
            variables.push((var, SearchRange::Finite { low, high }));
            String::new()
        });
        let template = template.to_string();
        Self {
            template,
            variables,
        }
    }
}

Of course, I could define MyReplace in such a way that it contains the variables Vec. But I wonder if this is really worth it. In practice, I might just do .to_string() and not worry about the extra allocation. But I would like to understand how I generally do a "conditional replacement" idiomatically. I guess the answer is: it depends?

That problem is exactly what I meant by "add them to the MyReplace struct." So something like:

struct MyReplace<'a> {
    variables: &'a mut Vec<(String, SearchRange)>,
}

Yes. If that works, then you should do that absolutely. But you established a requirement in your initial post that you not do that. I figured you had done some benchmarking to rule it out as an option.

The key here is that by using a Captures for this, you're already asking the regex engine to do a bunch of extra work---including an allocation for Captures---on your behalf. So cloning the String is probably not going to do much to your runtime.

I would say the idiomatic approach is to just call .to_string() and be done with it.

But if you need something to be as fast as possible and don't need capture groups, then I'd suggest just writing your own replacement routine. It's not that much code, and it's not that complex either given the existence of Regex::find_iter.

The Replacer trait tries to give you a way to write some common cases with a couple tricks for optimization (like Replacer::no_expansion). But it is by design not going to work for every use case mostly because I don't know how to design a replacement API that works well for all use cases. (This is a lot trickier than it sounds, because even if you think you know how to do it, how much more complicated have you made it for the simple use cases that cover 99% of what people need? Because if you've made that harder, then that a design I would reject. And at some point, you have to pop up a level and consider the complication of the API versus the code you're actually saving someone from writing. You could probably write a simple version of replace_all that isn't generic in about 5 minutes.)

4 Likes

(This post is about compile errors and the like, and not the practical issue at hand.)

I think there has been some confusion here. The outlives error is because Rust's notorious closure inference was getting in the way. Here we can see it has nothing to do with the trait bounds. Using coerce clues in the compiler that this is a perfectly valid (and embarrassingly simple) closure. But then trying to use it finally gets around to the point that the intended signature isn't compatible with the bound (though the error is still quite bad).

And here's an example that R doesn't have to outlive (any possible) &T to satisfy a FnMut(&T) -> R bound.

It's true that in the latter case, R could be borrowing from T, but I don't know that I'd call it a lending pattern per se, as neither the inputs nor outputs can't differ by lifetime, say. If there are lifetimes involved, they're fixed.

1 Like

Thanks! That really makes sense to me :heart:

Could you make the same comment in Closure outlives error should mention the source of the requirement · Issue #73144 · rust-lang/rust · GitHub where there are false error and explanation.

The last question becomes what does FnMut(&T) -> R mean on the input and output?
The one thing I can tell now is that FnMut(&T) -> R doesn't allow R to borrow from &T (R can borrow from T though) .

Update: Oh, the answer was right there

:white_check_mark:

1 Like

I wonder: Are Rust's closures not powerful enough to allow me to do what I like to do? (Assuming a different interface of regex.)

It seems like with a lot of effort, it's possible to let Rust do what I intended to do. We can demand that a closure returns a type that "depends" on a specific lifetime:

/// Type that "captures" a lifetime
pub trait CaptureLt {
    type WithLt<'a>;
}

fn use_it<R, F>(_val: F)
where
    R: CaptureLt,
    F: for<'a> FnOnce(&'a Outer) -> R::WithLt<'a>,
{}

However, type inference fails, so using this complex interface is like:

fn main() {
    // We need this to aid type inference in regard
    // to the return type of the closure or function:
    struct TyCon;
    impl CaptureLt for TyCon {
        type WithLt<'a> = &'a Inner;
    }
    use_it::<TyCon, _>(|outer: &Outer| &outer.field );
    use_it::<TyCon, _>(f);
}

(Playground)

Note that this works on stable and doesn't require #![feature(closure_lifetime_binder)].

I wonder if Rust's type system could be refined/replaced to avoid this sort of trouble. This also reminds me of the necessity to box certain futures, solely due to issues with lifetimes.

Anyway, these are just hypothetical thoughts. Back to the original problem:

Okay, understood, and that makes sense.

I only skimmed this, but I think you're basically talking about "I want a trait bound for closures that might or might not borrow their inputs". That's the more complicated version, this one would be

for<'any> F: FnOnce<(&'any Outer,)>,
for<'any> <F as FnOnce<(&'any Outer,)>::Output: AsRef<str>,

But you can't use FnOnce(&Outer) -> R for this because you're forced to name the return type and we don't have generic type construction parameters (or higher-kinded types or whatever). So then on stable you make your own trait with some blanket implementations etc... but probably run into heaps of inference issues and/or normalization issues.

So it's sometimes possible, but not always practical, and almost always at a cost.

:100: [1]


  1. but I can be suckered into writing the complicated version (or at least re-explaining it) when it comes up on this forum more often than not :sweat_smile: ↩︎

2 Likes

Indeed, we could make our own trait with blanket implementation (aka trait alias) to work around this (just hypothetically exploring this idea):

pub trait GenericFnOnce1Arg<Arg>
where
    Self: FnOnce(Arg) -> <Self as GenericFnOnce1Arg<Arg>>::Output
{
    type Output;
}

impl<T: ?Sized, Arg, Ret> GenericFnOnce1Arg<Arg> for T
where T: FnOnce(Arg) -> Ret,
{
    type Output = Ret;
}

Now we can use a bound like that:

fn use_it<F>(_val: F)
where
    F: for<'a> GenericFnOnce1Arg<&'a Outer>,
{}

Here, the return type does not need to be named (but could have additional bounds added, like AsRef<str>, if desired [1]).

Now, using the nightly #![feature(closure_lifetime_binder), we can use this API:

fn f(o: &Outer) -> &Inner {
    &o.field
}

fn main() {
    use_it(for<'a> |outer: &'a Outer| -> &'a Inner { &outer.field });
    use_it(f);
}

(Playground)


Update: Combining all these ideas, and getting back to the original problem, we could do:

#![feature(closure_lifetime_binder)]

struct MyReplacer<F>(F);

impl<F> Replacer for MyReplacer<F>
where
    F: for<'a> GenericFnMut1Arg<&'a Captures<'a>>,
    for<'a> <F as GenericFnMut1Arg<&'a Captures<'a>>>::Output: AsRef<str>,
{
    fn replace_append(&mut self, caps: &Captures<'_>, dst: &mut String) {
        dst.push_str((self.0)(caps).as_ref())
    }
}

fn main() {
    let s1 = "Hello World!";
    let s2 = Regex::new("World").unwrap().replace_all(s1, "Universe");
    assert_eq!(s2, "Hello Universe!");
    let s3 = Regex::new("World").unwrap().replace_all(
        s1,
        MyReplacer(
            for<'a> |caps: &'a Captures<'a>| -> &'a str {
                let universal: bool = false; // actually some more complex computation
                if universal {
                    "Universe"
                } else {
                    &caps[0] // don't replace
                }
            }
        )
    );
    assert_eq!(s3, "Hello World!");
}

(Playground)

I don't want to say it's a good or idiomatic way to go. But it's curious that it's possible to solve this issue in that way (on Nightly, and being willing to use a newtype wrapper to adjust regex's API)!


  1. (Playground) ↩︎

1 Like

I have been experimenting more with this, and tested if the regex API could provide a more generic implementation for regex::Replacer, and it seems it can:

diff --git a/src/regex/string.rs b/src/regex/string.rs
index 65a7674..4aa8a75 100644
--- a/src/regex/string.rs
+++ b/src/regex/string.rs
@@ -2371,6 +2371,22 @@ impl<'c, 'h> ExactSizeIterator for SubCaptureMatches<'c, 'h> {}
 
 impl<'c, 'h> core::iter::FusedIterator for SubCaptureMatches<'c, 'h> {}
 
+/// Trait alias for `FnMut` with one argument, which allows adding a bound
+/// without specifying the closure's return type.
+pub trait GenericFnMut1Arg<Arg>
+where
+    Self: FnMut(Arg) -> <Self as GenericFnMut1Arg<Arg>>::Output
+{
+    /// Return type of the closure.
+    type Output;
+}
+
+impl<T: ?Sized, Arg, Ret> GenericFnMut1Arg<Arg> for T
+where T: FnMut(Arg) -> Ret,
+{
+    type Output = Ret;
+}
+
 /// A trait for types that can be used to replace matches in a haystack.
 ///
 /// In general, users of this crate shouldn't need to implement this trait,
@@ -2501,10 +2517,10 @@ fn no_expansion(&mut self) -> Option<Cow<'_, str>> {
     }
 }
 
-impl<F, T> Replacer for F
+impl<F> Replacer for F
 where
-    F: FnMut(&Captures<'_>) -> T,
-    T: AsRef<str>,
+    F: for<'a> GenericFnMut1Arg<&'a Captures<'a>>,
+    for<'a> <F as GenericFnMut1Arg<&'a Captures<'a>>>::Output: AsRef<str>,
 {
     fn replace_append(&mut self, caps: &Captures<'_>, dst: &mut String) {
         dst.push_str((*self)(caps).as_ref());

(GitHub)

It passes regex's tests, and will allow to run this code on Nightly:

#![feature(closure_lifetime_binder)]

use regex::{Captures, Regex};

fn main() {
    let s1 = "Hello World!";
    let s2 = Regex::new("World").unwrap().replace_all(s1, "Universe");
    assert_eq!(s2, "Hello Universe!");
    let s3 = Regex::new("World").unwrap().replace_all(
        s1,
        for<'a> |caps: &'a Captures<'a>| -> &'a str {
            let universal: bool = false; // actually some more complex computation
            if universal {
                "Universe"
            } else {
                &caps[0] // don't replace
            }
        }
    );
    assert_eq!(s3, "Hello World!");
}

I don't think it would be a breaking change. However, the extra trait alias is somewhat ugly in the generated documentation.

What do you think?

Whether this is interesting for regex, I feel like it's a general pattern that might come in handy in other scenarios.


For future reference, I created a draft PR: More generic impl of Replacer for closures #1048.

In this context, I also found: the Replacer trait should be parameterized over a lifetime #777.


Update:

By implementing this less generically, we can actually get a quite clean API:

/// If a closure implements this for all `'a`, then it also implements
/// [`Replacer`].
pub trait ReplacerClosure<'a>
where
    Self: FnMut(&'a Captures<'a>) -> <Self as ReplacerClosure>::Output,
{
    /// Return type of the closure (may depend on lifetime `'a`).
    type Output: AsRef<str>;
}

impl<'a, F: ?Sized, O> ReplacerClosure<'a> for F
where
    F: FnMut(&'a Captures<'a>) -> O,
    O: AsRef<str>,
{
    type Output = O;
}

impl<F: for<'a> ReplacerClosure<'a>> Replacer for F {
    fn replace_append(&mut self, caps: &Captures<'_>, dst: &mut String) {
        dst.push_str((*self)(caps).as_ref());
    }
}

(GitHub)

The only thing that's a bit ugly is the redundant associated type "Output".

1 Like

Can you explain how it works?

I'll try to respond later in the discussion of the draft PR #1048, where I just added some more notes.

To workaround it, quinedot has provided a way in stable Rust[1], the coerce helper function that describes lifetime connections. Rust Playground


  1. you'll see it if you've clicked some example link ↩︎

1 Like

Thanks! I updated the draft PR and commented.

1 Like

I hear your warning and I'm curious if my above linked PR draft #1048 for regex is also subject to these "heaps of inference issues and/or normalization issues". I tried to hide all implementation details in a private module and believe there should be no issues (but happy to be taught otherwise if there is an issue).

In particular, I would like to know if you see any "cost" that I didn't think of yet.

Update: One issue might be: Why can't I use lifetime bounds in HRTBs?

The good things about the version I looked at is

  • inference related
    • There was only one implementation on the workaround trait
    • And the inputs were generic on lifetimes but not anything else
    • So whatever inference is possible is probably as good as it gets
  • coherence related
    • the Fn traits can't be manually implemented on stable, making it less of a breaking coherence change on stable to widen the generic implementation

And I think most of the bad things I've thought of are already being talked about in that thread (requiring lifetimes to be the same in the version I looked at, inability to ergonomically assert an inferred bound so you can split lifetimes and still be higher ranked (and if there is a way to do that on stable it will involve even more "type shenanigans"); inference breakage is hard to predict and test; necessity of explaining somewhere what your sealed trait actually does since the list of implementations is no longer sufficient to tell what implements Replacer; ...)

1 Like

Thanks a lot for your assessment.

That's interesting. Does that mean if it will be possible to implment Fn traits manually on stable on day, then the PR would become breaking in some cases? Maybe that depends on future coherence rules on that matter?

To understand this, I created one crate with this code:

pub trait Trt {}

impl<T: ?Sized + Clone> Trt for T {}

And another crate implementing this trait as well as ToOwned for a struct S that it also defines:

struct S;

impl ToOwned for S {
    type Owned = S;
    fn to_owned(&self) -> S {
        S
    }
}

impl first_crate::Trt for S {}

Coherence rules allow me to implement ToOwned as well as Trt for S. But if I now widen the implementation in the first crate:

-impl<T: ?Sized + Clone> Trt for T {}
+impl<T: ?Sized + ToOwned> Trt for T {}

Then I get for my second crate:

error[E0119]: conflicting implementations of trait `Trt` for type `S`
  --> src/main.rs:10:1
   |
10 | impl first_crate::Trt for S {}
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: conflicting implementation in crate `b`:
           - impl<T> Trt for T
             where T: ToOwned, T: ?Sized;

For more information about this error, try `rustc --explain E0119`.

I guess this falls under loosening generic bounds being a minor breaking semver change (or rather: becoming a breaking semver change if manual implementation of FnMut would be made possible on stable)?

This would be (at least) a (weak) a pro-argument for the PR, I guess, because it might be non-breaking today, but could become breaking tomorrow.

I'm not sure what you mean. Note that F: for<'a> &'a Captures<'a> is a more loose bound than F: for<'a, 'b> &'a Captures<'b>. The latter bound is stricter.

The "requirement" is that it's necessary to use the less-strict bound if you want the return type of closures to depend on the type parameter 'b of Captures<'b> (problems pointed out there). But it's not a "requirement" for the user of the API. Not sure which of those you meant.

In an ideal world, do you think closures passed to Regex::replace_all should fulfill the stricter bound (F: for<'a, 'b> &'a Captures<'b>) for tidiness, even if it's not necessary (because Captures<'b> is covariant over 'b and the caller of the closure can always coerce Captures<'b> to Captures<'a> if b: 'a)? I feel like this is an aesthetical question without any real-life impact, but I stumble upon this question in other contexts too:

Do I use two lifetimes or one lifetime when (co)variance allows me to combine two lifetimes into one lifetime.

Gotta run, so this will be an ultra abbreviated reply until later, but to give you something to play with real quick...

It's already a (minor) breaking change because I could have something like

impl<T> MyTrait for T: Replacer {}
impl MyTrait for Box<dyn for<'a> FnMut(&'a Context<'a>) -> &'a str> {}

And your loosened bound will cause it to fail. But this is less likely then people implementing traits for their own types probably (and they can't implement FnMut for their own types).

(If you try this you'll get a forward-compatibility warning, but if you read the issue, the current plan is to continue excepting anything that gets the warning.)