Such a complex life cycle, I'm almost crying:sob:
Is there a question?
yes i don't quite understand😭
That still isn't a question by itself.
Whenever you feel stuck, try to break the problem into different parts and tackle each part separately. For this example, try to start with narrowing your question to something along the lines of "How to understand the lifetime annotations in this function?"
basically it ensures that the string and the pattern can both live as long as the iterator requires them to.
I usually start trying to understand a lifetime bound by starting with how it's used in the function parameters. In this case it's on the reference to self &'a self
, so we know the lifetime needs to be valid for the borrow of self that the method is called with.
Then we can look at the return type, which also contains the same lifetime. That means the borrow of self
also has to be valid for as long as the return value of this method lives.
Those are really the most important parts of this signature. The Pattern<'a>
bound essentially means that P
may use the same borrow of self
and rely on it's validity for the scope of 'a
. From the perspective of a user of this particular API that isn't super important though[1]
-
especially since
Pattern
isn't stable ↩︎
I'm not fully understanding this, so please correct me if I'm wrong anywhere.
We're looking at this method:
str::split
, where str is the primitive str
type, not the module std::str
.
Let's look at the signature:
pub fn split<'a, P>(&'a self, pat: P) -> Split<'a, P>
where
P: Pattern<'a>;
It means the same lifetime is used for the reference to Self
, the lifetime argument to the Pattern
trait, and the lifetime of the result Split
.
Because a reference &'a T
is covariant in its lifetime 'a
, this means that &'a self
needs to live at least as long as 'a
(but could live longer).
Let's look at what 'a
is used for in the Pattern
trait:
A
Pattern<'a>
expresses that the implementing type can be used as a string pattern for searching in a&'a str
.
Correspondingly, it's used here:
pub trait Pattern<'a> {
type Searcher: Searcher<'a>;
fn into_searcher(self, haystack: &'a str) -> Self::Searcher;
fn is_contained_in(self, haystack: &'a str) -> bool { ... }
fn is_prefix_of(self, haystack: &'a str) -> bool { ... }
fn is_suffix_of(self, haystack: &'a str) -> bool
where
Self::Searcher: ReverseSearcher<'a>,
{ ... }
fn strip_prefix_of(self, haystack: &'a str) -> Option<&'a str> { ... }
fn strip_suffix_of(self, haystack: &'a str) -> Option<&'a str>
where
Self::Searcher: ReverseSearcher<'a>,
{ ... }
}
To be honest, I'm a bit confused too. Why should a Pattern
be longer living than any of the function calls? Perhaps because some methods might want to return references to parts of the Pattern
instead of the haystack
?
In practice, 'a
seems to be arbitrarily long because of the implementors in std
:
impl<'a> Pattern<'a> for char
impl<'a, 'b> Pattern<'a> for &'b str
impl<'a, 'b> Pattern<'a> for &'b String
impl<'a, 'b> Pattern<'a> for &'b [char]
impl<'a, 'b, 'c> Pattern<'a> for &'c &'b str
impl<'a, 'b, const N: usize> Pattern<'a> for &'b [char; N]
impl<'a, F> Pattern<'a> for F where F: FnMut(char) -> bool
impl<'a, const N: usize> Pattern<'a> for [char; N]
I.e. a &'b str
is a Pattern<'a>
for any 'a
. Thus, in case of using string slices as a pattern, we could re-write the original signature as follows:
pub fn split<'a, 'b>(&'a self, pat: &'b str) -> Split<'a, &'b str>;
Basically it means:
pat
(if it's a&str
) can have any lifetime.- The returned
Split
captures the lifetime of&self
(the&str
we want to split) and the lifetime'b
of our search pattern&'b str
(because we use it as an argumentP
toSplit<'a, P>
).
In short:
The iterator returned by the function depends both on
- the searched string
- and the search pattern,
and it cannot exist longer than any of these two.
Maybe it's just overzealous (or warranted but extraordinary) caution on the part of the API designers. AFAICT this restriction could be lifted, but not introduced, without breaking semver compatibility. (However, I don't see any good reason for it, either – patterns look to me like entities that operate on a string in the sense of pure computation, so they shouldn't really need to store the string or return references inside the patterns themselves instead of merely reborrowing the string.)
This design allows writing a heap-free Searcher
for situations where there’s a sentinel string in the input that needs to be recognized again later, e.g. Bash heredocs or MIME multipart encoding.
Yeah, this is exactly what I already don't think is necessarily good taste – it means that the pattern is stateful. That sounds more like proper parsing territory than "simple"(-looking) pattern matching, but of course YMMV.
Because Pattern::into_searcher
needs to be able to return a Self::Searcher: Searcher<'a>
. Why is Seacher<'a>
bound by 'a
? Because it has the method Searcher::haystack
, which returns the original searched &'a str
. This allows the Searcher
to carry the entire state required to perform the search (the original string, the pattern, the current search position etc).
WOW, same color theme! I love purple too.
Searcher<'a>
is the basic trait that all searchers need to implement. [...] It also contains ahaystack()
getter for returning the actual haystack, which is the source of the'a
lifetime on the hierarchy. The reason for this getter being made part of the trait is twofold:
- Every searcher needs to be able to store some reference to the haystack anyway.
- Users of this trait will need access to the haystack in order for the individual match results to be useful.
Also,
The lifetime parameter on
Pattern
exists in order to allow threading the lifetime of the haystack (the string to be searched through) through the API, and is a workaround for not having associated higher kinded types yet.
Which is saying, Pattern
could drop the lifetime if it used a GAT:
pub trait Pattern {
type Searcher<'a>: Searcher<'a>;
fn into_searcher(self, haystack: &str) -> Self::Searcher<'_>;
In which case we would instead have:
pub fn split<P: Pattern>(&self, pat: P) -> Split<'_, P>
And as Pattern
is unstable but GATs are stable (soon), this could also actually be done before stabilizing them, I guess?
I wondered the same thing; someone brought it up already but I didn't see a reply.
My main uncertainty is that I haven't experimented with breaking into a sealed trait, so I'm not sure how much of a breaking change it would be. Maybe someone more familiar with the SemVer considerations of sealed traits can chime in.
Even if it is technically breaking [1], I would guess that the sealed nature will make breakage minimal and it wouldn't be considered breaking in spirit. But that's just a guess.
-
i.e. stable code can somehow rely on the fact that
Pattern
has a lifetime parameter today ↩︎
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.