Have you never written a parser?
What happens when you read a stream of bytes that's actually UTF-16 encoded?
You get a stream of 16-bit codepoints. Not bytes.
Then if you wish to parse this further with a lexer, you'll get a stream of tokens, typically 32-bit integers. Not bytes.
Not everything is byte, that's why we have strongly typed languages.
Not everything that streams large chunks of contiguous data around is a POSIX file handle and returns 32-bit integer I/O error codes.
In my mind, the ideal trait inheritance hierarchy ought to look something like the following:
// A stream is just a "fat" iterator.
pub trait Read : Iterator {
type Error=();
// Shamelessly copying the C# Pipeline concept here
fn read( &mut self, required_items: usize = 0 ) -> Result<&[Self::Item],Self::Error>;
// Ditto.
fn consume( &mut self, items_used: usize );
// A stream *really is* an Iterator, allowing fn next() to have a default impl in terms of stream functions!
// Now if "impl trait" was used in Iterator's fns, Read could *specialise* things like fn peekable() and the like
// with versions optimised for streams...
fn next(&mut self) -> Option<Self::Item> {
if let Ok(b) = self.read( 1 ) {
self.consume( 1 );
return Some(b[0]);
}
else {
return None;
}
}
}
pub trait AsyncRead : Read {
// ... Futures-based async versions of fn read() goes here ...
}
// Defaults to bytes, but doesn't force it!
pub trait IORead<Item=u8,Error=i32> : AsyncRead {
fn close( &mut self );
fn seek( &mut self, position: u64 );
// ... other functions that are more specific to file descriptors / handles ...
}
Now imagine that you want to parse an XML file with an unknown encoding. Right now, this is... icky in most languages, because you have to read a chunk of the header, try various encodings to find the bit that says what encoding the file is in, then restart from the beginning using a wrapper that converts from bytes to characters. But you've already read a bunch of bytes, so now what? Not all streams are rewindable!
With something like the new C# Pipeline I/O API, the low-level parser would start off with a Read<Item=u8>
, make the encoding decision, and then the high-level XML parser could use Read<Item=char>
. The encoding switch at the beginning would be very neat because you just don't call consume()
; This would work fine even on forward-only streams such as a socket returning compressed data.
Similarly, if the String
type was instead a trait that &[char]
mostly implemented, zero-copy parsers would be fairly straightforward with this overall approach...
Behind the scenes, advanced implementations could keep pools of buffers and use scatter/gather I/O for crazy performance. The developer wouldn't even have to know...
This is what the new C# I/O API is trying to do, but it's not using the power of template programming to the same level that Rust could. Compare the C# Iterator<T>
interface to the Rust Iterator
trait. It's night & day!