Find and replace in OsString

I have a slice &[String] of strings parsed from a config, and I expect any of them to contain a special token %f which I want to replace with a filename, which is a Path (or OsString). The resulting sequence is going into Command::args() which accepts OsStrings just fine, and my strategy was to work entirely at this lower level since I don't need the more strict guarantees of a utf-8 String. The only problem is that I'm losing the handy String::replace() and I'd rather not write an ad-hoc byte-level replace by hand. Surely there's a better way?

Better yet, may be there's a crate that already works with this kind of substitutions? I'm not tied to "%f" particularly, any syntax would do.

Thank you!

bstr for the rescue.

3 Likes

Ah, bstr! I have actually read burntsushi's blog post when it came out, but then completely forgot about it.

So, bstr worked. Now I have this wordy, but working piece:

    let bfilename = filename.as_os_str().as_bytes();
    let args = args
        .iter()
        .map(|arg| OsStr::from_bytes(&arg.as_bytes().replace("%f", bfilename)).to_owned());

There is a different problem however: it's Unix-specific. Turns out, on Windows OsString is a sequence of arbitrary 16-bit values…

I happen to be okay with that, as my program is Linux-specific anyway, but I'm still curious how the seemingly common problem of string substitution could be solved in terms of OsString in a platform-agnostic way.

P.S. It's also funny that filename.as_bytes() and arg.as_bytes() come from completely different traits, unix ffi and bstr respectively :slight_smile:

You can use cfg blocks and platform extension traits.

But I think it's also common to just punt on non-unicode on windows.

3 Likes

Thanks for the pointers! Apparently bstr is even better than I thought and it already handles what I did, giving me the fail/lossy choice on Windows but doing zero-cost conversions on Unix.

This basically does the exact same dance I did with ffi:

1 Like

I'll also mention that the situation will improve if/when we get the Pattern API for OsStr, or anything else that allows splitting an OsStr into a sequence of UTF-8 strs and non-UTF8 blobs. Progress on that front seems a little frozen around uncertainty about how much of wtf8 to expose or change.