Read and Write should handle any Copy type

peter_bertok · October 15, 2015, 11:53pm

One thing that annoys me about most existing languages is the lack of generics in the standard library, often a symptom of adding templates to the language too late. This inevitably results in duplication of APIs specialised for various types. Thankfully, Rust has (largely) avoided this problem by using Traits and Templates heavily in the std library.

However, there are some clear throwbacks to older styles of programming likely inherited from the legacy of other languages.

An example is that the Read and Write traits are for u8 only. This misses, in my opinion, an opportunity to unify or improve a range of APIs.

Fundamentally, a "stream" is like an "efficient Iterator". Instead of iterating a single item at a time, a stream iterates a data source using an efficient packed array of items with each "next()" call. Iterator traits are all template based, so why aren't streams?

There is no compelling reason I can see why Read and Write can't or shouldn't be rewritten to accept any 'Copy+Sized' type. In this scenario, File::open() would return Read<u8>, but other APIs would be free to accept other types.

Examples of where this could make sense include database APIs (Read<Row>) and streaming data processing (Read<Sample>).

sfackler · October 16, 2015, 1:03am

What is the rationale for the Copy bound? I don't see how that would work in the Read<Row> example for instance unless your database can only store numbers in fixed width rows or something like that. If the bound is there to allow for the memory layout of Copy types to be directly serialized, that's a wildly unportable thing to do in general and I'm not sure the standard library should be going out of its way to encourage that. In addition, I believe it exposes safe code to undefined behavior: padding bytes in the layout of a data type contain undefined values, so if you serialize one to e.g. an array and then read the bytes out, you're in a bad place.

The Read and Write traits are designed to be interfaces for reading and writing streams of bytes, and the API explicitly indicates that. The APIs are basically direct translations of the read and write syscalls. They deal with slices of bytes because that's what the underlying calls deal with. The error type is built in many ways around the kinds of errors that come out of sockets and files. What is to be gained by overloading these traits to deal with chunked streams of arbitrary (Copy only?) types? It seems like somewhat of a false equivalence.

peter_bertok · October 16, 2015, 6:02am

I come from a background of extremely abstract (yet still efficient) languages such as Haskell, where everything is reduced down to the most abstract trait possible that is required to implement each function (or type class).

Your comment about the errors returned by Read/Write APIs is a valid point, of course. However, I do feel that in terms of abstract interfaces, even Rust gets stuck in the "pit of overspecialisation". I'm not the only one who thinks this.

For example, see this [post] (C++ has vector(n, value). c has calloc(). rust has, uh, - #6 by kjpgit) by a user complaining that the "string search" functions work only on string types. Which makes sense. Unless you want to search for a byte sequence in a byte array, which crops up often in real-world code. The substring search code could easily accept any iterable-of-equatables, not just "string", which happens to be a specific one. It could just as easily be i32 or u8, or even a struct. Why does it have to be "just char", and nothing else? Because that's the most common use, or the first problem that someone had to solve, so we should stop there and call it a day?

Just because Read and Write were originally intended for stream I/O, doesn't mean that the abstract traits should be overspecialised for that purpose alone. Buffered transmission of data is a very common design pattern, which will force developers to come up with their own Read/Write traits. Except they might call them Source/Sink, BufInput/BufOutput, or whatever. The implementation details will look an awful lot like Read/Write, but they won't be able to re-use the functions in the standard library like "copy", "take", "chain", etc... They'll have to re-invent those wheels as well. Of course, then the next logical step will be the adaptors: ReadSource/WriteSink, ReadBufInput/WriteBufOutput, and so forth. Blech...

Think about what you just said when you made the comment that the return value of the Read and Write functions are designed to map to the read and write syscalls: an abstract API has just baked the specifics of POSIX into it forever and ever. Eww! First of all, not all the world is POSIX, Windows still has a pretty big market share, last time I checked. Second, the error codes of read/write don't even come close to covering even a subset of reasonable use-cases. Think compression, cryptography, multiplexing, inter-process communication, etc... Wildly different errors. In fact, I have this exact problem, right now. I'm trying to implement the Read trait for a decompression library, and the errors returned a very restrictive.

In my opinion, there should be a clear chain of "more and more specialised traits" with increasing functionality and specialisation along the lines of:

Iterable<T> -> Read<T:Copy+Sized> -> Read<u8> -> StreamRead

Ideally, the "Error" type should be a template parameter as well. In the case of Iterable, it could be a zero-sized type (equivalent of Option::None), which would preserve its current behavior, but the StreamRead could require std::io::Error or whatever. This way, a "CryptoRead" could return errors specific to the Cryptographic algorithm. Decompression could return detailed error types. All of the APIs could then share the various adaptors and utility functions.

This isn't unique to Rust, of course, I'm just saying that the "templated from the beginning" API design could allow a much more elegant language than almost any other. Rust is already way ahead of most similar languages in this respect, I just feel like it's got one foot stuck in the past.

bluss · October 17, 2015, 12:36pm

Do they need to be templated from the beginning though? trait Read could be upgraded to trait Read<T=u8> seamlessly (well the details may be sadder than that), but see this example of default type parameters

bluss · October 17, 2015, 4:06pm

Current string search algorithm works on any orderable alphabet, so it would require Ord, not just equality to work. This particular issue, extension of element and subslice search, is being worked on though.

DenisKolodin · October 17, 2015, 4:50pm

You can wrap Read/Write with your trait that implements generic data type reading, in addition you can simple wrap R/W with own iterator to implement something like Java's Scanner. Else why you have to use raw Read/Write?

Gankra · October 17, 2015, 6:14pm

Please for the love of god no.

Generics APIs are a burden that undermine our ability to understand and optimize. They're valuable, but we shouldn't further generalize our APIs just because we can.

When I joined the Rust project, the standard library was glorious:

impl<K: Eq + Hash, V> HashMap<K, V> {
    fn get(&K) -> Option<&V>;
}

Now

impl<K: Eq + Hash, V> HashMap<K, V> {
  fn get<Q: ?Sized>(&Q) -> Option<&V> 
    where K: Borrow<Q>
          Q: Hash + Eq;
}

This was a necessary generalization to solve real problems, but it unquestionably reduced the quality of the interface, which now requires you to understand what the heck a Borrow is.

This proposal solves no problems, makes the API less clear, and introduces tons of problems:

Per sfackler:

endianess (in what endianess is T read and written?)
padding (padding bytes are uninitialized memory, can and will cause chaos)

But also:

forbidden values (&T is Copy + Sized for all T, yet is nonsensical to (de)serialize!)

Read and Write are designed to interact with the external processes and the underlying operating system, and as such have little business being typed.

If you want to serialize and deserialize data in a type-directed way, use Serde or rustc-serialize.

sfackler · October 18, 2015, 12:00am

Windows's API behaves almost identically to POSIX's - a buffer of bytes is read from/written to: ReadFile function (fileapi.h) - Win32 apps | Microsoft Learn, WriteFile function (fileapi.h) - Win32 apps | Microsoft Learn. In fact, Windows's socket library is modeled directly after the BSD socket API that evolved into POSIX: recv function (winsock.h) - Win32 apps | Microsoft Learn, send function (winsock2.h) - Win32 apps | Microsoft Learn.

The last IO reform RFC proposed adding an associated Error type to Read and Write. It turned out that it made those traits wildly unusable in any kind of generic context so we gave up on that idea. There are serious, nontrivial implications of adding genericity to APIs.

peter_bertok · October 18, 2015, 1:11pm

I don't think I explained myself very well, both yourself and @sfackler seem to have misunderstood my original intention.

What I was trying to get at is that anything that implements Read is really a "Buffered Iterator", with file input/output being only one specific application of the concept. The alternative uses wouldn't be used to read or write files or sockets, but for processing arbitrary streaming bulk data sources. Think DSP-like code, financial analysis, etc...

The angle that I was approaching this from -- but didn't express well enough in my original post -- is that in my past experience with other languages, iterators have woeful performance in a number of cases which could be solved by having an extended "bulk data iterator" available in the language. Essentially, streams, just like the Read/Write traits in Rust.

In principle, the Read trait should be nothing more than that -- a bulk iterator, with File or Socket input/output just using the u8-specialization of the trait.

There aren't a huge number of differences between the ordinary iterator and the bulk iterator. Primarily, it boils down to amortizing expensive operations such as error handling or kernel calls across more than just one data element. This is a very generic concept, and isn't restricted to byte stream input/output.

Now, I'll grant that unlike other languages, Rust's Iterator trait is notably improved. It has far more functions defined, allowing for a wide range of specialisations and optimisations. This is already a damned sight better than almost any other language out there. It allows for a wide range of optimisations for bulk data -- many operations will optimise away to a memcpy or can be auto-vectorised -- but this depends on compiler magic.

The amortization problem still remains an issue though. Compiler magic can't make kernel calls go away. There is no clear way to turn a generic "Iterator" into a true "Bulk Iterator". The Read trait as it is in Rust already has hints of this, as-if someone made a half-hearted attempt to implement a bulk iterator library and then stopped. It has very similar adaptors to iterators, for example, but they're a bit... half-arsed. They're missing key functionality and aren't generic.

Take a look at: Iterator in std::iter - Rust

Easily half of those could be directly translated to a "fatter" version that operates on buffers instead of individual items.

So, just to clarify: I'm not trying to read or write structs from files! This is not about serialisation, and endianness just doesn't apply. It's about covering a range of abstractions that isn't catered for in most languages: generic iteration over bulk data, with expensive calls amortized over many elements. The most common application of which just happens to be file I/O in practice...

Topic		Replies	Views
Stream API and types help	2	2517	July 17, 2018
Rust beginner notes & questions help	179	29453	July 23, 2018
Towards a more perfect RustIO	54	11955	June 14, 2020
Why isn't there a `std::fs::read<T>()` function?	4	338	December 7, 2025
Does Rust have severe shortcomings?	11	3504	July 17, 2018

Read and Write should handle any Copy type

Related topics