Add unbuffered `RawStdin`, `RawStdout`?

#1

A redditor just pointed out to me that Rust’s stdio primivites are buffering internally and there seem to be no way around it.

I have wrote some tools made to pass a lot of data through stdio, and now I see that they are not zero-copy, because I’ve overlooked Stdin. I’d like to correct that, but if I go with from_raw_fd, then I’m going to loose portability.

My questions:

  • Is anyone aware of crates that implement unbuffered io for Rust across all supported platforms.
  • Would it make sense add unbuffered RawStdin, RawStdout,… to stdlib?
1 Like

#2

You could implement your code in terms of BufRead::fill_buf but that might be cumbersome.

0 Likes

#3

There’s also from_raw_handle for Windows, at least.

In both cases, the new object assumes ownership and will close it when dropped – you can avoid that with into_raw_fd/into_raw_handle when you’re done. Maybe it would be ok to just wrap it in ManuallyDrop, but I’m not sure if that will leak anything else.

0 Likes

#4

winapi-util provides some limited facilities for dealing with this.

1 Like

#5

It seems to me that it would be fairly easy to just do it in stdlib. After all stdlib has to deal with all that anyway, and then Stdin is just a BufReader wrapped over RawStdin. I really don’t understant the rationale of not doing it like this in the first place. :confused:

0 Likes

#6

Sure, I think it’s reasonable to add to std. I was just presenting more on the current options.

0 Likes

#7

Because it’s less API surface area and there wasn’t anyone championing its addition. Buffered Stdin is the better default.

0 Likes

#8

I agree that it’s a better default, but it seems natural to me that a system programming language should have an ability to do unbuffered stdio without jumping through hoops. :slight_smile:

0 Likes

#9

I’d be curious to see examples of real-world programs impacted by this.

If a program does large reads (> 8 KB) then it bypasses the buffer already. If it does many small reads, then it should benefit from the buffering. So for most programs that read a large amount of data, the BufReader should be neutral or beneficial.

The worst case is a program that does a small read (< 8 KB) and then doesn’t read any more. But the total amount of data in this case is small, so the overhead of copying it one additional time is limited.

0 Likes

#10

I could also see a program wanting raw access for small reads+writes in an interactive program.

1 Like

#11

The way I think about: it’s more a matter of principles. System programming language should have an access to raw stdio. No need to debate why exactly: maybe someone is writing a plugin, where the plugin is a process with stdio file descriptors substituted with something and the reads and writes are actually important. Maybe someone needs to actually slow down their app by doing a lot of small reads on stdio, or something weird like this - who knows. If it was possible in C/C++ - it should be possible in Rust.

0 Likes

#12

Sure, I won’t argue against that. But if it’s only needed in unusual cases, then it seems like more of a candidate for a third-party library. If raw access were necessary for optimal performance in common use cases, that would be a stronger argument for inclusion in std.

Exposing StdinRaw in std may not be as simple as it seems. For example, what if some functions called by my program use std::io::stdin(), while others use raw access? Consider the following sequence of calls:

  1. Call function a() which reads from std::io::stdin().
  2. Call function b() which reads from a StdinRaw.
  3. Call function a() again.

This could cause input to be read out-of-order, since the second call to a() could read data that was buffered during the first call, before the data read by b().

Currently, libstd’s safe APIs for stdin and stdout all use global locks so users don’t need to worry about such things. Yes, unsafe or platform-specific code may have problems. But I’d worry about providing a safe API in libstd that makes it much easier to cause this type of bug.

2 Likes

#13

On a similar note, I was surprised when I found out that BufReader.lines() was allocating a new String for each iteration of the loop (I think that the String.split_*() methods have the same problem, but their docs are more opaque). This is the kind of thing that I would expect to be a zero cost abstraction in rust.

There are a bunch of crates which define Iter-like traits that avoid this problem, but they’re all PoC experiments with no intention of stabilization (search for push iterators or streaming iterators). The general consensus seems to be that we should wait for GAT to land, and make an Iter trait that is capable of optionally accepting a lifetime. I’m not completely convinced that this will be possible without breaking changes to Iter, but I’m happy to wait and see.

Since there are a lot of performance-interested people in this thread: do we think that there is appetite for a faster alternative to BufRead.lines() in the standard library, and what’s the best way to get it there?

(I can create a new thread for this if people prefer)

0 Likes