The best I have found so far is std::str::from_utf8, but it returns only the length of valid prefix (in case of error), so I need to either
call from_utf8_unchecked introducing unsafe (and potential UB if I happen to make a bug there), or
call from_utf8(...).unwrap() introducing panic (that would not happen unless I make a bug, but a panicking code path would still be present)
I mean ... it's not a big deal, I will just use one of the above if I won't find anything better, but for some reason I have got an impression that APIs like this (i.e. returning partial success safely without a need to re-check) are a preferred way of doing things in rust, so I suspect I am missing something obvious.
I don't really understand your question. There are only three ways to do this
Either do it unchecked and risk UB
unwrap and risk a panic
or handle potential errors.
You already listed the first two above and the third is just the same as the second but you do actual errorhandling instead of calling unwrap(). How else would you like this to be solved?
What I do not like about it is that it is just an integer and the fact that the prefix has been checked and confirmed valid is not encoded in the type system. While I understand that the Rust type system may not be able to express everything and some unsafe is sometimes necessary, here it seems easy.
Of course I could just do:
use core::str::Utf8Error;
use core::str::from_utf8;
use core::str::from_utf8_unchecked;
// note: here Utf8Error contains some reundant info, but I wanted to keep the example small
fn my_from_utf8(slice: &[u8]) -> (&str, Result<(), Utf8Error>) {
match from_utf8(slice) {
Ok(output) => (output, Ok(())),
Err(err) => (unsafe{ from_utf8_unchecked(&slice[0..err.valid_up_to()]) }, Err(err)) // SAFETY: docs say `slice[0..err.valid_up_to()]` was checked and is valid
}
}
but I would expect something like that in standard library, so I was asking whether I missed it or it is really not there. (I have yet to check [u8]::utf8_chunks suggested by @BurntSushi)
I presume here the major reason is that the error can't contain a &str of the valid prefix without being bound to the lifetime of the argument slice. And errors should not contain borrows because they're supposed to be passable back down the call stack.
But in general, yeah, Rust's valid-prefix-parsing story isn't super good. Often when writing ad-hoc parsers it would be nice to have something like
Ok, so I have checked the [u8]::utf8_chunks and it is also not ideal:
It does not distinguish between invalid byte and end-of-slice in the middle of character
It has the "annoying behavior" of returning None on empty slices
Of course both of these are workaroundable, but unless something other pops out, I am concluding that my "dream function" does not exist in standard library, so I will just build some solution I like.