Why no `std::str::from_boxed_utf8` exist?

Greetings!

Again and again I'm facing with discrimination of boxed slice.
Why here std::str::from_boxed_utf8_unchecked exist but no "checked" version? Because this chain:

boxed slice -> vec -> string -> boxed str

is not very convenient and a little overheaded =/

IMO it might just be an oversight/omission. It would make sense to add this method, or better yet, impl TryFrom<Box<[u8]>> for Box<str>.

2 Likes

std::str::from_boxed_utf8_unchecked was added in PR #41258, which initially also tried to add str::str::from_boxed_utf8, but it was later removed in the same PR due to this comment. The motivation is strange though considering that at the time there was String::from_utf8 and it simply returned the Vec in the error type.

7 Likes

The important thing is that String::from_utf8 doesn't take ownership of slice. It takes a reference and in case of failure you still have your slice and can do something about it.

On the other hand it's not clear what str::str::from_boxed_utf8 should do if you pass invalid slice. If it would just return a failure then invalid byte sequence would be lost. Or it can embed bad sequence into Result but that's very convoluted and hard to deal with API.

If you really need it you can easily implement it in your own code with the use of std::str::from_utf8 and std::str::from_boxed_utf8_unchecked.

No. Maybe you’re looking at the wrong function?

pub fn from_utf8(vec: Vec<u8, Global>) -> Result<String, FromUtf8Error>

3 Likes

There should be no overhead.

  • Box<[u8]>Vec<u8> is free, it just adds a capacity field matching the length
  • Vec<u8>String is also free, as String is just a newtype
  • StringBox<str> is not free in general, but it is here because the first step put len == cap, and the second step didn't change it, so there's no shrink_to_fit needed.
5 Likes

Um, sorry? Returning Result is the right way to design a fallible function. String's analogous FromUTF8Error does give you back the offending byte slice by-value, so there's no question as to whether this would be feasible with the identical API over boxed slices/strings.

4 Likes

I don't think the utf-8 validation (needed in this case) is free.

But it's no overhead compared to what a "direct" / single-function Box<[u8]> -> Box<str> conversion would cost. That would also need to perform UTF-8 validation, in the exact same way that String::from_utf8() needs to.

2 Likes

In my case it Box<str>String and it just adds a capacity.