Comparing `&OsStr` for prefixes and suffixes?


#1

Hi all,

I am not seeing any way to slice or dice an OsStr or OsString, which is problematic. Do I really need to write non-portable code in order to perform any path examination beyond entire path elements?

I am wanting to implement a function that checks if a given path ends with a given suffix (or prefix). This suffix, however, could be of any length, and need not be an “extension” preceded by a “.”.

Any suggestions? Do I really need to write separate functions for Windows and unix?


#2

The point of OsStr is to abstract over a platform specific representation. Internally, on Windows, that representation is WTF-8 while on Unix it’s just a sequence of bytes. Those internal representations aren’t part of the public interface of OsStr. However, if you can make an assumption about your platform, then you can access the raw bytes. A similar interface exists on Windows, but unlike Unix, access to it is not free: the internal WTF-8 representation is transcoded to/from potentially ill-formed UTF-16.

So, I think the answer to your question is, “yes, right now, you really do need to write separate functions, and your Windows function will have an additional cost.”

With that said, one time, I was feeling frisky and plunged my hands into the depths of the internal representation: https://github.com/BurntSushi/ripgrep/blob/d79add341ba4be10bb3459877318b9c5a30f5db3/globset/src/pathutil.rs#L41-L85 — but note, that’s a relatively specialized case that really doesn’t generalize. You could probably adapt it to your needs such that if your needle is an &OsStr and your haystack is an &OsStr, then you could do the transmute on both to &[u8] and then do your search.

I suppose the proper answer here is that the OsStr APIs should be expanded to support typical search/split/whatever operations one might find on &str. In particular, to satisfy the general case, your needle needs to be an OsStr.

If you can make assumptions and you can afford it, like, your needle is always a &str, then you could convert your path element to UTF-8 first and then search.


#3

Yes, I’d love for there to be some common functionality in the OsStr world.

I don’t want to assume unicode, or pay the cost for conversion to/from unicode (so str is right out), but would like to have some simple operations on OsStr such as comparison of prefix/suffix relationships. I believe it is always either &[u8] or &[u16] so there shouldn’t be any problem making such comparisons. :frowning2:


#4

To be clear, an &OsStr is always a &[u8]. The key platform distinction is the encoding of that &[u8]. On Unix, it’s just arbitrary bytes. On Windows, it’s WTF-8, which is basically UTF-8 w/ surrogates. You only get &[u16] after an explicit transcoding step.

What that means is that all of the search routines need to be aware of WTF-8, which I think seems OK, although I haven’t given it much thought.