How to get &'a str from &'a [u8] without any copying involved?

I am trying to implement a zero-copy parser that parses underlying &'a [u8] and gives me a struct containing &'a str fields. I tried String::from_utf8_lossy but could figure out the lifetimes.
What is the way to get &'a str from underlying bytes?
Ps. The str doesn't have to be valid utf8 the parser handles the correctness checks anyway.

You can get a &'a str from &'a [u8] by using str::from_utf8. This will fail when it's not valid UTF-8. I don't know what exactly you're saying with your remark

A str in Rust always has to be valid UTF-8. You can possibly skip the check using unsafe code, if you're certain that it wouldn't fail; but putting invalid data into a str can cause UB.

well I meant it can be like utf8 lossy

Lossy conversion replaces invalid data with '�', so it needs to be able to modify the string (potentially even grow it). The best thing you can get is a Cow<'a, str>, which already is the return type of String::from_utf8_lossy; this approach avoids copying whenever possible which is precisely the case when no modification needs to be done, i.e. if (and only if) the &'a [u8] already contained just valid UTF-8.

I tried Cow<'a,str> but for some reason, I could not return it apparently cow itself doesn't live as long as 'a
It might be due to the fact you mentioned that lossy version might be different from the original slice

If you're saying that you cannot turn Cow<'a, str> into &'a str (at least, without leaking memory), that's true1. You'd need to keep the Cow<'a, str>, i.e. also use Cow<'a, str> as the type of the field of the struct you're ultimately producing.

1 you can only go from &'b Cow<'a, str> to &'b str. A Cow<'a, str> is either containing a &'a str or a String, the latter cannot be turned into &'a str (without leaking memory); someone needs to keep ownership of the string.

I just went with str::from_utf8 as you suggested and properly handling the error which is I ultimately better anyway :slight_smile:

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.