Hey, I'm writing some code and using Path::ends_with
, and it's taking up about 95% of the cpu cycles. Does anyone thing there's any chance of speeding it up?
Is this on a release build? A release build will enable optimizations that might speed up ends_with considerably (it uses a lot of Iterator methods).
My first guess is that ends_with
has linear complexity of Path
length. Under the hood it uses UTF-8 on Linux, so it has to check if suffix beginning is on a char boundary, thus the linear dependency. You could try to convert Path
and your prefix to &[u8]
, perform check manually and measure performance, but note that it can result in a subtly incorrect behavior. (i.e. some strings will pass the check even though first char of the suffix will be different)
@fpgaminer I'm using release with debug symbols (so I can get meaningful stack info).
@newpavlov I suspect you are right, sadly. Why does it use utf-8, since it will accept any byte array? Is it to handle normalization?
It's unlikely this has anything to do with UTF-8. It's more likely that ends_with
is "slow" because it might be doing more work than you expect. Namely, it actually needs to parse the ending pieces of both paths since it determines the suffix computation based on whole parts only. See this demonstration: Rust Playground
In UTF-8, char boundary check does not require linear time. See is_char_boundary
implementation for str
:
fn is_char_boundary(&self, index: usize) -> bool {
// 0 and len are always ok.
// Test for 0 explicitly so that it can optimize out the check
// easily and skip reading string data for that case.
if index == 0 || index == self.len() { return true; }
match self.as_bytes().get(index) {
None => false,
// This is bit magic equivalent to: b < 128 || b >= 192
Some(&b) => (b as i8) >= -0x40,
}
}
If you don't care about the aforementioned path parsing, then on Unix, you could probably implement this much more efficiently, e.g., Rust Playground
On non-Unix (i.e., Windows), you'll need to perform an encoding step first unfortunately given the current API. IIRC, more convenient string searching routines are coming to OsStr, and once those land, it should be easier to implement this efficiently across all platforms.
Yes, you are right, though IIUC under the hood ends_with
also uses indexing over str
, which if I am not mistaken is linear (chance for an optimization?):
self.len() <= haystack.len() &&
haystack.is_char_boundary(haystack.len() - self.len()) &&
self == &haystack[haystack.len() - self.len()..]
No, str
indexing works on byte indexes, not chars, so indexing is not linear.
For my use case I'm parsing repeatedly. I wonder if I can cache the parsed components.
UTF-8 shouldn't have anything to do with it. Substring searches (which ends_with
belongs to) in UTF-8 don't need to check char boundaries, because bytes of sequences never overlap. Paths use WTF-8 which still preserves that property.
Path::ends_with()
source is:
fn _ends_with(&self, child: &Path) -> bool {
iter_after(self.components().rev(), child.components().rev()).is_some()
}
So there is a lot of stuff going on there, and maybe it is really unnecessarily slow?
Unfortunately, it appears that it is doing what it has to to be correct:
let a: &Path = "a/b/./c".as_ref(); // redundant '.' component
let b: &Path = "b//c".as_ref(); // extra slash
assert!(a.ends_with(b)); // ...yet this is still true
(actually, I am (pleasantly) surprised that the .
component is not considered a violation, as I've probably done path normalization before out of fear that it wouldn't work)
I love the fact that you can rely on libstd
to do the right thing rather than some fast hack that fails with edge cases. I think I can be quicker by exploiting some more info about my problem, namely that my path is always canonical, so I can basically just do a byte comparison. I'll post with the results.
Since I'm iterating over a list of paths lots of times, I'm going to try out normalizing paths and doing a byte comparison. To be honest, the data I'm reading from is machine generated and very unlikely to contain things like "//" or "/./", but it would be nice to cover these anyway.
If this can be of anyhelp, you can easily find your bottleneck using valgrind --tool=callgrind your/binary
. You can then visualise the output file using KCachegrind.
This is super super duper useful !
see:
Yeah I discovered KCacheGrind during looking at this and would wholeheartedly recommend it!