Rust beginner notes & questions

If you’re referring to the Midori project, no, they haven’t. The project fell apart, and well before that happened, the language used evolved quite a ways away from standard C#; so much so, in fact, that it was christened M#. (Joe Duffy’s blog has many interesting details.)

6 Likes

I’d like to hear about other papercuts. I think we’ve pretty much exhausted the Read topic and are going in circles now.

There’s a lot more to Rust than just C++ 2.0, unless 2.0 means an entirely different language. There’s naturally a reason to compare with C++, and occasionally borrow ideas from. The two languages are intended for the same domain with some similar goals.

As has been said, please provide some semi fleshed out code to demonstrate this. As you say, the language is powerful so you ought to be able to show some code using the features. I’m not asking for a full blown impl but something more than pseudocode.

std allocates buffers in some circumstances - that’s not the issue. I’d like to see C# outperforming Rust out of the box though - do you have examples?

The shims, at least the ones @BurntSushi wrote, are pretty small - as he said himself, it’s less than a day’s work. I don’t understand the complaint in that light. If I didn’t have any context, I’d think that Rust doesn’t have any stdlib based on your comments.

Again as he said, those things can be extracted out into separate crates and people can reuse them. Some people can be, for example, equally “appalled” that std has zero support for http out of the box. Rust std isn’t meant to be all encompassing.

Also, C# has just recently gotten the Pipeline API, and its std has been around much longer and was always richer than Rust’s. So prior to this, did you think C# was worthless?

5 Likes

OK, this is getting ridiculous. Please stop misrepresenting my code. I didn’t reinvent decoding or decompression. I wrote small shims that farm the primary task out to something else. Those shims can be put into crates if it’s worth doing and can be reused.

16 Likes

I’d like to point out that Microsoft has been heavily investing in the last few years in extracting things out of stdlib and into nuget packages, heavily because they were not happy being tied to monolithic .net framework releases just to iterate on an API. So I’m not even sure how much they would even agree that having a fat stdlib is good.

Afaik pipelines isn’t planned to be placed in stdlib, but will instead be kept in a nuget package that can be included if the application/library wants to utilize it.

6 Likes

Interestingly, things like “EntityFramework” are Nuget packages, not in the standard lib.

.Net core, the latest incarnation of .Net, is entirely made of up of small, modular Nuget Packages, not dissimilar from Rust’s philosophy with the stdlib. See the section “NuGet as a first class delivery vehicle” in this blog.

5 Likes

I can’t. That’s not the point.

Obviously, both Read and Read2 would use exactly 1 copy in this scenario, because this scenario is a user-to-kernel call. The difference is that the Read2 API is free not to make the copy in other scenarios, such as memory-mapped files or user-mode networking. This allows the same abstraction – the exact same trait – to be the core of a much richer set of “streaming” code, not just traditional POSIX/Win32 file I/O.

Believe it or not, traditional file I/O is not the common case, and will be less and less common over time. Right now, you can buy non-volatile storage that plugs into the DIMM sockets of a server and is mapped into memory. This is the future of storage. Nobody in their right mind would tie their fate to the 1990s POSIX way of things when memory-mapped I/O is pretty much going to be 100% of all local storage I/O, and user-mode RDMA will be pretty much 100% of all server networking real soon now. There just isn’t any reasonable way to process 100Gbps Ethernet via traditional sockets, and this is also something you can buy right now.

Anyway, as you said, it’s better to just “show the code” instead of waffling on about a bunch of vaguely related topics. I’m just going to park my general philosophical issue with Rust’s API design and try and demonstrate why Read2 really is lower level than Read, yet is more flexible for higher-level code too.

First: A problem statement
Lets get back to just memory mapped I/O, because that’s a very real scenario that’s going to be increasingly common (or ought to be), and if you don’t quite see my point, some others (@newpavlov) do.

Memory mapped I/O is not just an &[u8] as @vitalyd suggested. Last time I checked, files are allowed to be many TB in size, yet 32-bit processes can only memory map at most 2GB at a time. That’s a stretch though, because the heap can fragment the entire address space even when not fully “using” it per-se. This can be troublesome with 64-bit code as well due to issues like 2 MB “large pages” interfering with the kernel’s ability to map large contiguous segments. The page table of most CPU has much lower practical limits than the theoretical 2^64 address space. Most have an addressable range between only 2^36 and 2^40 bytes.

So the solution is to manage a “sliding window” of some reasonable size, such as 128 MB on 32-bit or say 1 GB on 64-bit. This beautifully maps to the Read2 trait, which can smoothly handle sliding “views” like this without ever having to copy anything from anywhere. The user (Rust developer) of this type of API would never have to deal with the complexities of page table entry exhaustion or heap fragmentation, and their client code will smoothly work without issues, because someone more experienced then them has made the std::io code robust on their behalf.

I cannot stress this enough. This is a real problem, and no matter how much @BurntSushi insists he thinks he’s solved them, the harsh reality is that he hasn’t. For example, here’s RipGrep’s 32-build simply barfing on a file it tried to memory-map, whereas the 64-bit built doesn’t:

PS C:\Projects\VM\Virtual Hard Disks> C:\Tools\RipGrep32\rg.exe test .\fs1.vhdx
file length overflows usize
PS C:\Projects\VM\Virtual Hard Disks> C:\Tools\RipGrep\rg.exe test .\fs1.vhdx

Oops.

This is not his fault. It’s a fault of an attitude of “isn’t this just a &[u8]” when it is well established, for decades, that it isn’t.

PS: The error comes from mmap, which I’m sure contains a bunch of other bugs considering that it blithely assumes that file lengths are at most 4GB on 32-bit platforms:

Sure, this is “patchable”, but… not really. On 32-bit, ripgrep as written (in the master branch at any rate) – and all similar Rust code written by developers less experienced than @BurntSushi – will always fall back to copying Read APIs after attempting to memory map files > 2GB and failing.

If Read2 was used, everyone could have their cake and eat it too. The API by default would support zero copy, support memory mapping terabyte-sized files on 32-bit, use the same code path for both memory-mapped and streamed files, and only a couple of Rust std lib developers would have to worry about corner-cases like page table fragmentation on Xeons with large-pages enabled.

Back to the code

Lots of people are having trouble seeing how Read2 is simpler than Read. This is likely due to assuming that it must heap-allocate a Vec<T> or something.

This just isn’t the case. The trait can flexibly return any buffer to the caller, even a stack-allocated fixed-size buffer or a buffer handed to it when the source is opened. This limits the maximum “required items” that can be requested, but this is the exact same limitation that Read always has anyway, so nothing is lost:

E.g.:

// Fits on the stack of an Arduino. You
// really won't get lighter-weight than this...
struct TinyBufferRead2<T> {
    buf: [T;32],
    start: usize,
    end: usize
}

impl<T: Clone> Read2 for TinyBufferRead2<T> {
    type Data = T;
    // ...

    fn read( &mut self, required_items: usize ) -> Result<&[Self::Item],Self::Error> {
        if required_items > 32 { return Err(()) } // you asked for too much!
        // ... straightforward implementation...

        // note that Clone is required to shuffle the data around inside the buffer.
        // the most general case is for Read2 to wrap an existing Vec<T> that never 
        // grows, which doesn't require even this constraint. This is the power of Rust's 
        // type system that I would love to see utilised more!
    }
}

Meanwhile, a sliding-window mmap implementation with Read2 is trivial, and allows all existing “streaming API code” to be layered on top without having to either:

  • Assume, incorrectly, that mmap-ing a file will fit into a &[u8].
  • Write two versions, one for memory buffers and one for streams.
  • Write one version that deals with both… somehow.

What I would like to have seen in Rust
Basically, my disappointment is that what I was hoping for was something like C#'s variety of stream-based libraries, but more efficient and compiled. This is what I saw in Iterator, it’s basically C#'s IEnumerable<T> but better.

I would love to see other parts of the standard library get the same treatment. I would love it if I could “compose” streaming APIs such that decoding something like a memory-mapped OOXML file would be short, elegant, fast, and correct. E.g.:

  1. FileReader | MemoryMappedReader <- the OOXML file coming from disk (potentially > 2 GB).
  2. Chain<ZipFile> <- OOXML files are split into “parts” that have to be assembled.
  3. XmlReader <- internally switching between UTF-8 or UTF-16 encodings, so there’s a TextDecoderReader layer behind the scenes here.
  4. Base64Reader <- decoding a huge chunk of embedded stuff.
  5. JpegStream <- or whatever embedded content we’re trying to stream out efficiently.

Now, imagine the scenario where the Zip “compression” is “store” (pass-through uncompressed) and the XML encoding is UTF-8. In this case, using Read2, the first 3 wrappers up to XmlReader could just pass through the data with zero copy. The developer would have to do nothing special to enable this scenario.

4 Likes

I think the case you lay out for read2 over read is compelling in the sense that an API like read2 needs to be created for Rust. It may even need pushed into the standard library at some point, and read reimplemented in terms of read2 at some point. That being said, I don’t see why it isn’t entirely appropriate to create and iterate on such an API on crates.io and only once the kinks are worked out worry about whether or not it belongs in the standard library.

As has been pointed out, even C# with .Net Core is moving more away from a monolithic standard library and more towards a model similar to cargo (Nuget).

You seem to have a lot of low-level, yet, Enterprise-class experience that could be good for the Rust community. I think that no one here takes your insights lightly, I think there is just disagreement about the balance between what needs to be in the standard library vs what needs to be developed and nurtured in crates.io.

8 Likes

I don’t think this blindly assumes a file can’t be greater than u32 (equiv to usize on 32-bit); rather, I think it says, “If I’m on a platform with a maximum (directly) addressable range of 32-bits, I can’t mmap a file bigger than that.” Granted, there are ways to do paging/windowing and such (as you’ve described), but, this is simply a wrapper around the platform’s mmap implementation, and that would not support anything to be mmappable greater than the usize for that platform.

Having a flip through the memmap-rs crate shows that in this case it’s technically correct to return Result<usize> based on the description of get_len(), but this is confusing for the user.

Files can be bigger than u32::MAX, and it’s always been possible to memory-map files bigger than 4 GB on 32-bit platforms, exactly the same way it’s been possible to read files of arbitrary size using streaming APIs since forever.

The “mapped window size”, and the “file size” are distinct concepts, only the former is limited to isize (not usize!). The memmap-rs crate conflates the two in several places. Similarly, it uses the wrong integer type for the “offset into the file”:

The offset should always be u64, even on 32-bit platforms. For example MapViewOfFileEx takes dwFileOffsetHigh and dwFileOffsetLow parameters so that 64-bit offsets can be passed in using two 32-bit DWORD parameters. I think this API has been there since… NT 3.11 or NT 4. I dunno. A long time, certainly.

Submitting an issue for memmap-rs now…

5 Likes

Wouldn’t it be more correct to say that you can mmap a 4GB window within a file larger than 4GB on the 32-bit platform? You can’t actually map the whole file contiguously, right?

2 Likes

Yes.

You can map up to a 2 GB view (actually several!) into a file of any size on Win32. The upper half of the address space is reserved for the kernel and cannot be used for mapping, even with the /3GB flag (apparently).

I suspect Linux 32-bit has similar behaviour, but don’t quote me on that.

There’s a diagram on this page: https://docs.microsoft.com/en-us/dotnet/standard/io/memory-mapped-files

To quote the KB article:

“Multiple views may also be necessary if the file is greater than the size of the application’s logical memory space available for memory mapping (2 GB on a 32-bit computer).”

2 Likes

I feel compelled to add that this was a thing in the 90’s for Sun boxes, and was used to hide latency and coalesce IOPS to disk.

1 Like

I suspect that was a battery-backed RAID controller cache, which is pretty common in all modern servers or SAN disk arrays.

Compared to that, the future looks amazing: https://www.anandtech.com/show/12828/intel-launches-optane-dimms-up-to-512gb-apache-pass-is-here

This is just the beginning! Pretty soon you’ll be seeing practically everyone running databases with 100% of the I/O coming directly from these. Compared to that, NVMe flash storage at a “mere” 3500 MB/s looks glacially slow!!

Somewhat ironically, the great work that has been done by the various C#, Java, JavaScript, and Rust core developers bringing asynchronous I/O programming to the masses is now hampering performance in this shiny new world of non-volatile memory. The overhead of even the most efficient futures-based async I/O is simply massive in comparison to the latency of NVDIMMs, which are byte-addressible and are only 2-3x slower than normal DIMMs. When your storage latency is measured in nanoseconds, individual instructions matter and user-to-kernel transitions are murder…

Also see: https://www.snia.org/sites/default/orig/SDC2013/presentations/GeneralSession/AndyRudoff_Impact_NVM.pdf

2 Likes

Intel have made great claims about Optane before, and delivered in the end something which barely matches a modern SSD (and therefore does not justify the vendor lock-in or the software performance pitfall of adding yet another slower storage device to the virtual address space). I will wait for a more impressive product before calling this the future of storage.

Moreover, asynchronous I/O is mostly used for network I/O, not disk I/O. Because for disk, we pay the price of yet another stupid legacy decision, namely the Linux kernel devs declaring that you do not need asynchronous storage I/O as the disk cache will take care of everything for you. Tell that to performance engineers who have to debug latency spikes caused by software randomly blocking, now that every pointer dereference can turn into a disk access…

4 Likes

A slice is a window of usize indexable range. The actual range may be less, ie isize on 32bit windows, but that’s beside the point.

3 Likes

I have never, not once, claimed to solve all problems related to memory maps.

4 Likes

Now that I’ve toyed around with my pseudo-code of what it might look like (Read2), finding real issues (the avoidable mmap-related error in ripgrep), and realising that zero-copy is the future (NVDIMM), I feel like something akin to System.IO.Pipelines is necessary.

I’m not at all saying that it’ll look exactly like my Read2 sample, that’s just me doodling. A realistic example would probably have 3 to 4 low-level traits in some sort of hierarchy. It would obviously have to handle both read and write. It would probably need some restrictions on the types it can handle, likely Copy or Clone. I quickly discovered that the borrow checker makes this kind of API design… hard. Really hard. Someone with more Rust skill than me would have to put serious effort in.

The best chance this has if the current futures and tokio refactoring effort embraces this I/O model and it eventually replaces legacy Rust I/O. Paging @carllerche and @anon15139276 – you guys might be interested in this thread…

Both. 8(

Missing trivial things like hex-encoding in a standard library? In 2018? Too thin. A standard library that leans heavily on macros? Will never be properly IDE-friendly. The current direction is “Macros 2.0”. People want more of this. Some of the same people that complain about compile times, I bet. Sigh…

I’m saying Rust has most of the language features needed to skip over the meandering path taken by other languages, a running-start if you will. But huge chunks of it seems to be starting at step #1 and repeating the same mistakes, accumulating the same baggage along the way.

At some point someone will say: “This Rust language is too complicated and messy, I’ve got this great idea for a far simpler language!” and the wheel will turn one more time. 8)

My $0.02 is that pace of language development is accelerating. I think Rust tried to “stabilize” too early, and instead should have embraced true “versioning”. I would love to see a language that has a file extension such as .rs1 and then .rs2 the year after, dropping source-level compatibility on the floor. The PowerShell guys nearly did this, and now they’re pretending that PS6 Core on Linux is 100% compatible with .ps1 files written for Windows 2008 Server! 8)

I’m not sure about the latest C++ stuff, I’ve avoided C++ for a decade because of productivity-killing issues like the linker exploding in my face with gibberish errors because of external libraries being incompatible with each other.

For another real world example of zero-copy in the wild, refer to Java’s NIO library. Note that just like how I mentioned that my sample Read2 trait would likely need a variety of implementations for different styles of buffering, the Java guys have HeapByteBuffer, DirectByteBuffer, and MappedByteBuffer. A lot of people assumed that this style of I/O implies dynamically growing heap allocation only, but that’s not a constraint at all.

No, it was host cpu memory, though it was mapped in the kernel with a device driver layer that used it exactly like RAID controllers would later.

My apologies, I misunderstood. You did claim though that BOM-detection was something you’ve solved without much difficulty. That may be true, but it’s an especially simple case in a larger class of problems, including things like smoothly handling memory-mapped I/O and then layering things like BOM-detection on top and also doing so safely on 32-bit platforms.

Or even 64-bit platforms, actually, as I was pointing out above: https://github.com/BurntSushi/ripgrep/issues/922

I’d also like to apologise for picking on ripgrep, in no way is this reflective of the quality of your coding. I’d rather that the situation be that people with less skill than you be able to write Rust code that is fast and correct. This takes special care and attention in the std library to “steer people into the pit of success” instead of the “pit of failure”. Right now I suspect a lot of people are falling into pits full of stabby, pointy things…

You did, and you’re far better at Rust than I am. What chance do I have?