I am trying to implement a zero-copy parser that parses underlying &'a [u8]
and gives me a struct containing &'a str
fields. I tried String::from_utf8_lossy
but could figure out the lifetimes.
What is the way to get &'a str
from underlying bytes?
Ps. The str doesn't have to be valid utf8 the parser handles the correctness checks anyway.
You can get a &'a str
from &'a [u8]
by using str::from_utf8
. This will fail when it's not valid UTF-8. I don't know what exactly you're saying with your remark
A str
in Rust always has to be valid UTF-8. You can possibly skip the check using unsafe
code, if you're certain that it wouldn't fail; but putting invalid data into a str
can cause UB.
well I meant it can be like utf8 lossy
Lossy conversion replaces invalid data with '�'
, so it needs to be able to modify the string (potentially even grow it). The best thing you can get is a Cow<'a, str>
, which already is the return type of String::from_utf8_lossy
; this approach avoids copying whenever possible which is precisely the case when no modification needs to be done, i.e. if (and only if) the &'a [u8]
already contained just valid UTF-8.
I tried Cow<'a,str>
but for some reason, I could not return it apparently cow itself doesn't live as long as 'a
It might be due to the fact you mentioned that lossy
version might be different from the original slice
If you're saying that you cannot turn Cow<'a, str>
into &'a str
(at least, without leaking memory), that's true1. You'd need to keep the Cow<'a, str>
, i.e. also use Cow<'a, str>
as the type of the field of the struct you're ultimately producing.
1 you can only go from &'b Cow<'a, str>
to &'b str
. A Cow<'a, str>
is either containing a &'a str
or a String
, the latter cannot be turned into &'a str
(without leaking memory); someone needs to keep ownership of the string.
I just went with str::from_utf8
as you suggested and properly handling the error which is I ultimately better anyway
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.