Such a complex life cycle, I'm almost crying

Such a complex life cycle, I'm almost crying​:sob::sob::sob::sob::sob::sob::sob::sob::sob::sob::sob::sob::sob::sob::sob:

2 Likes

Is there a question?

yes i don't quite understand😭

That still isn't a question by itself.

Whenever you feel stuck, try to break the problem into different parts and tackle each part separately. For this example, try to start with narrowing your question to something along the lines of "How to understand the lifetime annotations in this function?"

4 Likes

basically it ensures that the string and the pattern can both live as long as the iterator requires them to.

I usually start trying to understand a lifetime bound by starting with how it's used in the function parameters. In this case it's on the reference to self &'a self, so we know the lifetime needs to be valid for the borrow of self that the method is called with.

Then we can look at the return type, which also contains the same lifetime. That means the borrow of self also has to be valid for as long as the return value of this method lives.

Those are really the most important parts of this signature. The Pattern<'a> bound essentially means that P may use the same borrow of self and rely on it's validity for the scope of 'a. From the perspective of a user of this particular API that isn't super important though[1]


  1. especially since Pattern isn't stable ↩︎

3 Likes

I'm not fully understanding this, so please correct me if I'm wrong anywhere.

We're looking at this method:

str::split, where str is the primitive str type, not the module std::str.

Let's look at the signature:

pub fn split<'a, P>(&'a self, pat: P) -> Split<'a, P>
where
    P: Pattern<'a>;

It means the same lifetime is used for the reference to Self, the lifetime argument to the Pattern trait, and the lifetime of the result Split.

Because a reference &'a T is covariant in its lifetime 'a, this means that &'a self needs to live at least as long as 'a (but could live longer).

Let's look at what 'a is used for in the Pattern trait:

A Pattern<'a> expresses that the implementing type can be used as a string pattern for searching in a &'a str.

Correspondingly, it's used here:

pub trait Pattern<'a> {
    type Searcher: Searcher<'a>;

    fn into_searcher(self, haystack: &'a str) -> Self::Searcher;

    fn is_contained_in(self, haystack: &'a str) -> bool { ... }
    fn is_prefix_of(self, haystack: &'a str) -> bool { ... }
    fn is_suffix_of(self, haystack: &'a str) -> bool
    where
        Self::Searcher: ReverseSearcher<'a>,
    { ... }
    fn strip_prefix_of(self, haystack: &'a str) -> Option<&'a str> { ... }
    fn strip_suffix_of(self, haystack: &'a str) -> Option<&'a str>
    where
        Self::Searcher: ReverseSearcher<'a>,
    { ... }
}

To be honest, I'm a bit confused too. Why should a Pattern be longer living than any of the function calls? Perhaps because some methods might want to return references to parts of the Pattern instead of the haystack?

In practice, 'a seems to be arbitrarily long because of the implementors in std:

impl<'a> Pattern<'a> for char
impl<'a, 'b> Pattern<'a> for &'b str
impl<'a, 'b> Pattern<'a> for &'b String
impl<'a, 'b> Pattern<'a> for &'b [char]
impl<'a, 'b, 'c> Pattern<'a> for &'c &'b str
impl<'a, 'b, const N: usize> Pattern<'a> for &'b [char; N]
impl<'a, F> Pattern<'a> for F where F: FnMut(char) -> bool
impl<'a, const N: usize> Pattern<'a> for [char; N]

I.e. a &'b str is a Pattern<'a> for any 'a. Thus, in case of using string slices as a pattern, we could re-write the original signature as follows:

pub fn split<'a, 'b>(&'a self, pat: &'b str) -> Split<'a, &'b str>;

Basically it means:

  • pat (if it's a &str) can have any lifetime.
  • The returned Split captures the lifetime of &self (the &str we want to split) and the lifetime 'b of our search pattern &'b str (because we use it as an argument P to Split<'a, P>).

In short:

The iterator returned by the function depends both on

  • the searched string
  • and the search pattern,

and it cannot exist longer than any of these two.

2 Likes

Maybe it's just overzealous (or warranted but extraordinary) caution on the part of the API designers. AFAICT this restriction could be lifted, but not introduced, without breaking semver compatibility. (However, I don't see any good reason for it, either – patterns look to me like entities that operate on a string in the sense of pure computation, so they shouldn't really need to store the string or return references inside the patterns themselves instead of merely reborrowing the string.)

This design allows writing a heap-free Searcher for situations where there’s a sentinel string in the input that needs to be recognized again later, e.g. Bash heredocs or MIME multipart encoding.

4 Likes

Yeah, this is exactly what I already don't think is necessarily good taste – it means that the pattern is stateful. That sounds more like proper parsing territory than "simple"(-looking) pattern matching, but of course YMMV.

Because Pattern::into_searcher needs to be able to return a Self::Searcher: Searcher<'a>. Why is Seacher<'a> bound by 'a? Because it has the method Searcher::haystack, which returns the original searched &'a str. This allows the Searcher to carry the entire state required to perform the search (the original string, the pattern, the current search position etc).

9 Likes

WOW, same color theme! I love purple too.

1 Like

I agree this is why.

Searcher<'a> is the basic trait that all searchers need to implement. [...] It also contains a haystack() getter for returning the actual haystack, which is the source of the 'a lifetime on the hierarchy. The reason for this getter being made part of the trait is twofold:

  • Every searcher needs to be able to store some reference to the haystack anyway.
  • Users of this trait will need access to the haystack in order for the individual match results to be useful.

Also,

The lifetime parameter on Pattern exists in order to allow threading the lifetime of the haystack (the string to be searched through) through the API, and is a workaround for not having associated higher kinded types yet.

Which is saying, Pattern could drop the lifetime if it used a GAT:

pub trait Pattern {
    type Searcher<'a>: Searcher<'a>;
    fn into_searcher(self, haystack: &str) -> Self::Searcher<'_>;

In which case we would instead have:

pub fn split<P: Pattern>(&self, pat: P) -> Split<'_, P>
6 Likes

And as Pattern is unstable but GATs are stable (soon), this could also actually be done before stabilizing them, I guess?

I wondered the same thing; someone brought it up already but I didn't see a reply.

My main uncertainty is that I haven't experimented with breaking into a sealed trait, so I'm not sure how much of a breaking change it would be. Maybe someone more familiar with the SemVer considerations of sealed traits can chime in.

Even if it is technically breaking [1], I would guess that the sealed nature will make breakage minimal and it wouldn't be considered breaking in spirit. But that's just a guess.


  1. i.e. stable code can somehow rely on the fact that Pattern has a lifetime parameter today ↩︎

1 Like