How to get &'a str from &'a [u8] without any copying involved?

NikosEfthias · December 29, 2021, 11:13pm

I am trying to implement a zero-copy parser that parses underlying &'a [u8] and gives me a struct containing &'a str fields. I tried String::from_utf8_lossy but could figure out the lifetimes.
What is the way to get &'a str from underlying bytes?
Ps. The str doesn't have to be valid utf8 the parser handles the correctness checks anyway.

steffahn · December 29, 2021, 11:32pm

You can get a &'a str from &'a [u8] by using str::from_utf8. This will fail when it's not valid UTF-8. I don't know what exactly you're saying with your remark

A str in Rust always has to be valid UTF-8. You can possibly skip the check using unsafe code, if you're certain that it wouldn't fail; but putting invalid data into a str can cause UB.

NikosEfthias · December 29, 2021, 11:35pm

well I meant it can be like utf8 lossy

steffahn · December 29, 2021, 11:38pm

Lossy conversion replaces invalid data with '�', so it needs to be able to modify the string (potentially even grow it). The best thing you can get is a Cow<'a, str>, which already is the return type of String::from_utf8_lossy; this approach avoids copying whenever possible which is precisely the case when no modification needs to be done, i.e. if (and only if) the &'a [u8] already contained just valid UTF-8.

NikosEfthias · December 29, 2021, 11:41pm

I tried Cow<'a,str> but for some reason, I could not return it apparently cow itself doesn't live as long as 'a
It might be due to the fact you mentioned that lossy version might be different from the original slice

steffahn · December 29, 2021, 11:42pm

If you're saying that you cannot turn Cow<'a, str> into &'a str (at least, without leaking memory), that's true¹. You'd need to keep the Cow<'a, str>, i.e. also use Cow<'a, str> as the type of the field of the struct you're ultimately producing.

¹ you can only go from &'b Cow<'a, str> to &'b str. A Cow<'a, str> is either containing a &'a str or a String, the latter cannot be turned into &'a str (without leaking memory); someone needs to keep ownership of the string.

NikosEfthias · December 29, 2021, 11:44pm

I just went with str::from_utf8 as you suggested and properly handling the error which is I ultimately better anyway

system · March 29, 2022, 11:44pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
How to map Cow<'a, [u8]> to &'a str help	3	2871	April 17, 2020
Parse number from &[u8] help	9	499	March 26, 2024
Whose job is it to validate UTF-8 strings? help	20	1142	November 29, 2023
How to convers a stream into a string help	5	1673	June 5, 2022
Is there a missing API for optimal &[u8] -> Result<String> conversion?	10	868	November 26, 2019

How to get &'a str from &'a [u8] without any copying involved?

Related Topics