If an error is returned then it must be guaranteed that no bytes were read.
And the section of Write::write says:
If an error is returned then no bytes in the buffer were written to this writer.
Those sound like pretty strict requirements to me, which make implementations more complex and potentially require them to have some kind of internal buffering just to fulfill these requirements.
Or possibly I am misunderstanding these; could you please explain how you understand these requirements?
Also what exactly does "no bytes were read" mean for Read. Does it mean that no data was stored in the given buf, or does it also mean that no data has been consumed irrevocably from the underlying data source?
If the error is ErrorKind::Interrupted I can understand those requirements because it should be safe to retry the read or write. But for all other errors I don't understand the rationale for these strict requirements.
The problem is that in both cases the data read / written is transformed or validated (e.g. " being escaped as \"). So:
during reading some data might have already been consumed before an error is detected, for example an invalid escape sequence "test\uINVALID". The way I understand the Read doc is that this would mean my custom Read::read implementation is not allowed to already store test in the given buf and irrevocably consume "test\u.
during writing some data might have been already written before an error is encountered. For example when writing "test\0" as JSON string, my implementation would first write "test then, due to JSON rules, encode \0 as \u0000, but if the underlying Write (e.g. a file handle or network connection) now returns an error, my Write::write implementation has already written "test and cannot 'undo' that.
I hope the explanation of my situation is not too cryptic; otherwise I can try to provide some code examples.
What I would do to conform to the trait API spec is:
In my reader/writer state there would be an Option<MyError> error field that is initially None.
At the start of a method the error field is checked and returned if not None. if let works well for this.
At the start of a method the error field is checked. If not None then the error is returned and the error field is reset to None. take works well for this.
When some bytes are read or written, and then an error is encountered, the error field is set to Some(err) and the bytes or byte count is returned. On the next call the error will be returned.
Consider that callers may continue to attempt to read or write after any error, not just Interrupted. They may also have expectations about / logical dependencies on knowing how many bytes were actually read or written.[1] So the requirements are "if you do a partial read/write, always return Ok(partial_length)".
Errors after a partial read/write should be returned upon the next call, via something like @jumpnbrownweasel's suggestion or just falling out naturally from the caller passing you a buffer to write with an error at the beginning next time, say.
You didn't overwrite anything in the buf they passed in, and future successful reads will reflect that (e.g. you won't skip some bytes in the middle). I don't interpret any requirements on not having read anything from some underlying data source. I can imagine multiple ways of meeting the requirement depending on the situation (buffering, but also not being able to recover from errors, or perhaps by requiring Seek on the underlying data source...)
Maybe they depend on you not having clobbered their read buffer, say. ↩︎
Thanks for your suggestions! They sound like good solutions to this. However, there are some issues:
You have to check if the error is ErrorKind::Interrupted and in that case not store it in the error field (or clear the field again after having used it once). Otherwise an infinite loop occurs because read or write keep returning ErrorKind::Interrupted (assuming you normally return the error for any subsequent read or write attempts) and the caller keeps retrying.
io::Error does not implement Clone, so you cannot easily return copies of it but instead have to re-create it in case you want to return it for any subsequent calls after the first error.
My Write::write implementation indirectly calls write_all on the underlying Write, so I don't actually know how many bytes have already been written when an error occurs.
So for now I changed my implementation so that it will still fail fast and return the error directly once it occurs, even if some bytes have already been read or written, and then keep returning the same error for every subsequent read or write attempt.
Even that complicated my implementation a bit because it now needs an additional field to store the error, and also I extracted the read and write implementations into separate internal methods to be sure I properly handle any error they return. Otherwise if I directly tried to do that within read or write, then that would have caused code duplication and would be error-prone in case I overlook usage of ? somewhere, which does not store the error in the field, but instead directly returns. So for example my read method now looks roughly like this:
fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
if let Some(e) = &self.error {
// Recreate previous error from error kind and display string
return Err(io::Error::new(e.0, e.1.clone()));
}
let result = self.read_impl(buf);
if let Err(e) = &result {
// Must not store `Interrupted` error as `self.error` and return the error again
// for subsequent calls because that could lead to infinite loops since `Interrupted`
// normally allows retrying
if e.kind() == ErrorKind::Interrupted {
...
}
self.error = Some((e.kind(), e.to_string()));
}
result
}
Coming from Java development, there the corresponding classes (InputStream / OutputStream for byte, Reader / Writer for char) don't have such strict (or even any) requirements for exception behavior[1]. Though Rust's Result and Java's exceptions cannot directly be compared I guess.
I am a bit curious how many other Read and Write implementations actually follow these error handling requirements so strictly.
Edit: The above solution to return the error delayed, for the next call, has another disadvantage for Write though: It might be confusing because the error is unrelated to the data provided for the current call. For example it might be an UTF-8 error, even though the data provided now is valid UTF-8 data. So the error message has to be adjusted to indicate that.
The closest to Rust's requirements is Java's InputStream.read(byte[], int, int) which discards an exception if some data has already been read. But that seems to only apply to InputStream itself but not subclasses, and is (in my opinion) error-prone because it silently discards the exception, which could be quite problematic if the InputStream cannot recover from it and is now in an inconsistent state. ↩︎
I suggested returning an error repeatedly because that's what I sometimes do, not because it is required by the trait API. Sorry about that, as it doesn't really work out well for you here. It seems better to return each error only once, takeing the value from the error field so it is moved to the caller.
I understand, that's a pain if write_all is being called from write. The only way to conform to the trait is to call write in a loop and add up the bytes until all bytes are written or an error occurs.
Have you thought about not exposing a Read or Write implementation in your API (I assume this is what you're doing), and instead exposing methods that are more appropriate for your library? Or do users of your library have a need for a Read and Write implementation?
Part of the motivation behind those requirements is error handling and recovery.
E.g. if you have a text stream -> GzipEncoder -> BufWriter -> File and the file encounters a StorageFull error then the program could delete some temporary files and retry the previous write. For this to work it must know exactly how many bytes were written up to that point. To know that the amount must be reported which requires an Ok(bytes), not an error.
A common way to get this wrong is turning one write call into multiple underlying ones where some succeed and a later one doesn't... now what? You can't report the length and the error at the same time.
Which is why the documentation says:
This function will attempt to write the entire contents of buf, but the entire write might not succeed, or the write may also generate an error. Typically, a call to write represents one attempt to write to any wrapped object.
With reads it is similar. Perhaps one has stacked several readers on top of each other and the bottom-most uses a non-blocking socket and every time it runs out of data it would bubble up a WouldBlock and the caller will then poll the socket until more data is available and continue to pull more data through the stack of readers. But as with the write case, this only works if the readers reported the amount of bytes they put into buffers correctly, otherwise they'd be implicitly dropping bytes on the floor since the caller doesn't learn about them.
Yes, I had actually considered that before, but for a different reason: The Read and Write returned by my methods are designed to work with valid UTF-8 data, so users would not necessarily have to perform UTF-8 validation or conversion themselves.
In general though I want to expose methods which return some trait that allows to read and write JSON strings in a streaming way.
However, it seems there are no dedicated traits for this[1] in the standard library. And Read and Write are quite commonly used, and have multiple convenience methods. This is why I chose them for now, and decided that if necessary to have custom subtraits with additional convenience methods for working with strings, e.g. StringValueWriter::write_str.
Other than possibly using an Iterator<char>, but if users want to actually process the UTF-8 data as bytes, they would have to convert them again. ↩︎
In that case I think conforming to the trait API is necessary since readers or writers can be stacked as mentioned by @the8472.
Edit: However, you may want to think about whether your API users would ever have a need to wrap the reader/writer returned by your API inside another reader/writer, is that useful? If not, then using the Read/Write traits may not make sense.
I agree with the others that the most obvious source of woe is depending on write_all in your write implementation. I'd probably try to not do that before trying to work around it.