Generate Utf8Error

It is possibly a stupid question, but I wonder how I can create an UTF8Error?
I'm looking for something like Error::from(...), but I'm not able to find it.
I can't construct it via UTF8Error{ ... }, since the fields are private.
Of course, I could provoke such an error by providing malformed data to an existing method such as from_utf8() and reuse it, but that seems quite unrusty.

The popular bstr crate (maintained by one of the lips-api team members) simply copied the source for str::Utf8Error and exposes it as its own Utf8Error. Maybe using your own Utf8Error would be a sensible solution in your case, too?

2 Likes

Why would you want to create this error? Are you trying to test something?

I'm writing a function that fiddles with utf8 at u8 stream level.
Of course, I could check a the very end, but I would prefer to throw an error at the place it occurs.

Thank you for the suggestion, but the idea was to have a common handling with UTF8 functions of the standard lib (Result<_,UTF8Error>) without much overhead.

If you are parsing chunks of byte stream as utf8 using standard library, then just return UTF8Error it produces. If you are implementing utf8 parsing yourself, then you should probably create your own error type (like mentioned above bstr crate). If you want, you could provide conversion from std::str::Utf8Error to your error type.

1 Like

I would prefer the other way around, but it seems to be impossible.
What is the rationale behind this design decision?
io::Error allows to generate instances, too, why not here?

2 Likes

I am not sure, but I would guess that io::Error is more of an exception, and most types in std have private implementation. std has very strong stability guarantees, so I understand why libs team is reluctant to make things public.

I think the rational is as always to reduce stable implementation surface. io::Error was designed as an extendable catch all error type from the beginning. I don't think that Option<u8>(type of member) or Option<usize>(return type of method) are good ways to encode error_len because the value range is limited to Option<1..=3>.

However, an ACP to allow construction of UTF8Error in user code is IMO likely to succeed. The need is apparently there. The hard part is to come up with a good signature. Some options are

  1. UTF8Error::new(usize, Option<usize>) and panic for Some(0) or Some(4..).
  2. Wait for ranged integer or refinement types (which may never come)
  3. Allow offsetting a given error. UTF8Error::offset_valid(self, offset: usize) -> Self
1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.