Rust beginner notes & questions

I'm intrigued by what you have to say regarding this topic. I've been looking for something to dig my teeth into with respect to Rust that I felt was interesting. I'd like to spin-up a Git repo to begin working on this idea. I'd like your participation, if nothing else at least advisory, but, any collaboration would be appreciated.

Honestly, I see that as the only useful way to move forward on something like this. Continued discussion about it is probably not useful without starting to actually implement something.

Some things to note:

  • It would be good to consider how this API could fit comfortably into the Redox (and perhaps TockOS) world
  • Linux/Unix/Windows/Redox (all similar performance with a user-level API that abstracts away any and all OS-specific issues [to the degree possible])
  • Use Cases:
    • Network IO
    • Database IO
    • File IO
    • Modern Memory/Storage Architectures
  • Cache friendliness at all levels
  • Ergonomic
  • Opinionated (make the "right" thing easy and the wrong thing impossible or difficult)
  • Async/Future friendly
  • Generator friendly
  • Reactive (pull vs push) / Back-Pressure friendly
  • EDIT: Expose a "Safe" C-API (interop)

Anything else?

I'd like to start by creating a Repo, spinning up a README.md in it and begin laying out requirements, designing Traits/Interfaces/Abstractions first (ideally with your seemingly knowledgeable input).

Let me know if you are interested in any sort of collaboration on this. If not, I'll probably pursue anyway because you've piqued my interest, but, I feel you (and others) could add a lot of missing experience and knowledge to the endeavor.

7 Likes

To be honest, I feel like I'm learning alongside with everyone else. 8)

I didn't start out with the zero-copy thing as a goal, I only noticed that as a possibility after reading about the System.IO.Pipeline design.

I think a lot more research coupled with some experimental API tire-kicking is the best bet, and I would definitely seek the involvement of the tokio guys. Asynchronous I/O is usually used when performance matters the most, and zero copy = more performance!

I haven't personally read through the full System.IO.Pipeline source yet, but I might over the weekend, time permitting. Similarly, it would be worth it to see what the Java guys did in the NIO library.

I also discovered that the borrow-checker doesn't like my naive design, making it a little painful to use. That might just be the way it is, but I'm not sure. A more experienced Rust API designer could probably provide hints...

2 Likes

Have you looked at the mio? Also, see the book: GitBook - Where software teams break knowledge silos.

EDIT: I knew about mio (in a trivial sense) before, but, I hadn't yet spent much time digging into it. As I begin to, I think that starting any sort of project (as opposed to mio) is probably not useful (at least until I understand mio fully and am sure that it isn't meeting, or at least intending/endeavoring to meet, the needs you have described).

EDIT: Does anyone else know of other Crates that should be examined in depth and understood before deciding that something like what is proposed in this thread is needed/not already in the work?

1 Like

I did look at that, it's "just" doing things like wrapping the system-provided APIs for notification. It's useful in scenarios where you might be reading from 1000 sockets at once, all of which are trickling data into a dozen server CPU cores doing the processing.

The Read2 API is much simpler than this conceptually, and is only tangentially related. E.g.: it might be feasible to extend it with a fn peek_any(...) API somehow via a related trait or something.

PS: The Performance of Open Source Software | Parsing <span class="caps">XML</span> at the Speed of Light was one of the reasons I went down this rabbit-hole. I've recently been thinking about how to parse XML as fast as possible for use-cases similar to ripgrep. The common-case is almost, but not-quite, pass-through for a lot of the bulk data processing. If the encoding is UTF-8, which it usually is, then you can often pass the text onto the reader as-is. Except you can't, because of &nbsp and the like. You can also often pass through the text as &str, but just like with the mmap situation, that can get a bit iffy in memory-constrained environments.

It's an interesting design problem, that's for sure! 8)

1 Like

Yes, I agree, as I look more into mio.

Yes, it is.

I think it might be advantageous to end/close this thread and spawn off a new thread to do the following:

  • solidify the requirements
  • survey the existing crates.io efforts and how they do/do not meet the requirements
  • analyze C# PIpes API and Java NIO and perhaps others for ideas/inspiration and/or problems to avoid
  • spin up a repo to begin the Trait/Interface designs and make decisions about how this should all interoperate etc. (as I mentioned above).

EDIT: Unfortunately, today, I need to keep focused on other work, so, I'll try to get back to this when I can.

EDIT: I've started a new thread here (Towards a more perfect RustIO) to provide a place to discuss the "Rust IO" issue. Further discussion on this thread should be be limited to other concerns (if any).

4 Likes

Idea: a Read-like trait that returns an owned buffer (like a buffer from the bytes crate). This sounds even more generic than BufRead / Read2 to me, because it means that use cases where the buffer must be owned by the caller for later usage are handled. In exchange, it makes the user feel less 0-cost, because there'll be an additional layer of refcounting (but in exchange there'll be true 0-cost available for the additional use cases like the kernel ring buffer).

FWIW, this is the approach taken for tokio's Streams: passing around owned buffers of the data.

1 Like

(also, I'm being told if this can't make it into libstd, which is likely until there is at least one widely-used implementation, I guess, then it could maybe make it into mio)

As promised, I've spun up a new thread for further discussion on the IO issues here (Towards a more perfect RustIO) to provide a place to discuss the "Rust IO" issue. Further discussion on this thread should be be limited to other concerns (if any).

2 Likes

I request that you be a bit more careful in your words.
I now understand that you mean "sympathy horrified": that he had to implement so many "wheels" himself, but my first three readings had me sputtering that you dare insult one of the most advanced, stable and high-quality Rust codebases out there.

@burntsushi has put a tremendous amount of thinking into his work, and has has gotten literally world-record speeds out of it. There is a reason that Visual Studio Code adopted it to power their search.

All your criticism seem to boil down to "why is this not in std?!?", and Rust's answer is: "so that brilliant people like @burntsushi can experiment inside crates, find the perfect API, and then we'll take it into std" (caveat: better yet keep it as an easily included crate. After all, "std is where libraries go to die_" in other languages).

The reason these crates are not there yet is because we're still building them. Rust's advancement stage might be at "level 4", age-wise it is still around java 1.1 levels.. there's only so much foundation you can build in three years with a mostly volunteer community.

I'm now taking a breather for a while, I find that I am physically angry from reading this..


Updated to add, I see that in the past 24 hours (post I hadn't gotten to yet when writing the above), this thread has taken a tremendously productive turn. I'm happily surprised! Thank you @peter_bertok and @gbutler for turning this heated discussion into a productive direction! : Heart:

14 Likes

Thanks for your flattering words! I know you didn't mean this, but a small note of clarity: no code should be held sacred or unassailable. Especially code that others gravitate towards learning from. There are parts of ripgrep I'm proud of and parts that I'm not so proud of. Small iterative improvement is the name of the game. :slight_smile:

And yes, I am glad this conversation took a productive turn. It was frustrating at times!

13 Likes

Ok, I’ll bite - which parts are you not proud of? :slight_smile:

Humor aside, I do think it’s valuable to have Rust codebases that one can point people at when they ask “can you show me an example of good/canonical/modern/well-structured/etc Rust code”. Ripgrep would be one candidate that certainly crosses my mind.

3 Likes

Quite a bit!

  • It is hard to reason about the interaction between flags. There are a lot of flags, and every time a new one is added, it's almost guaranteed that there is some interaction that wasn't anticipated.
  • This one came up in the conversation above: there are two different implementations of search in ripgrep, where one operates over buffers and one operates over a single big slice. The latter doesn't have some features implemented, such as context handling. So one consequence of this is that any time you request contexts, ripgrep will never be able to use memory maps. This is a failing on my part to write a more generic search routine and this is fixed in my in-progress work on libripgrep.
  • While the UTF-16 transcoder wasn't a lot of work on my part, I really think it should live in a separate crate since it is generically useful.
  • The parallel directory iterator API is baroque.
  • The printer has become very complex. This is partially due to the fact that this is primarily where all the interactions between flags are resolved. I'm working on a rewrite of the printer in libripgrep, and there are some marginal improvements, but there is still case analysis strewn about everywhere that I know is going to make the code hard to read. I don't have any good ideas here unfortunately.
  • ripgrep's src/main.rs is a bit too busy for my taste. There's a lot going on and very little meaningful abstraction happening.
  • The ignore crate's directory traversal code is quite complex and also very busy.
4 Likes

Forgive my ignorance, but is this mostly a result of some flags being basic primitives (ie bool, usize, etc)? I’m sure you’ve thought this through, but I’ll ask anyway: is there room to turn them into higher level types and attach some behavior to them that they can apply to the output? Or would that create its own abstraction spaghetti?

Is there a particular method/fn that demonstrates the issue you’re describing?

1 Like

Dunno. I mean, the technique you're describing is one I've employed many times over the years, so I'm not unfamiliar with it. :slight_smile: To me, the complexity arises in the permutation of all possible flags. There are probably competing (but perhaps not fundamental) pressures from the reduction of code duplication, abstraction soup (as you say) and performance (printing must be fast).

What I'm describing is emergent complexity in the interaction between different features. As such, there is no one bit of code I can just point to and say "yeah clean this up." It is spread out all throughout ripgrep. The printer just happens to be one particularly highly concentrated part of this: https://github.com/BurntSushi/ripgrep/blob/master/src/printer.rs --- But it's not the only the part. The part where CLI arguments are turned into a slightly higher level set of structured config knobs also exhibits this: https://github.com/BurntSushi/ripgrep/blob/master/src/args.rs

The reason why I'm not proud of this is partially because I can use my eyes to look at this code and say "that's inscrutable." But it's also partially because it has empirical grounding. There are bugs and/or feature requests that exist due to the unexpected interaction between flags. It would take a while for me to collate those issues, but it does seem like a worthwhile dimension to track, so if I think of it, I will start tagging issues with a new label that indicates that I believe it's a consequence of this emergent complexity. Perhaps a solution will emerge once I have more data!

5 Likes

Just to add my 2 cents. It should be noted that the reason many are ecstatic about the Rust lang project is that it's the first real contender for taking the c/c++ king-of-the-hill crown. None of the other more "elegant" languages are in the same league as Rust, because, fundamentally, Rust targets the bare metal.

While many would like to see Rust used for web dev, there is a large need for the electrical engineering community to have a C/C++ replacement for DSP/embedded/IoT/smart toys. Thus, while Rust should be as elegant and easy to learn as possible, the dream is that Rust will power the compilers and interpreters of all those other 'elegant' languages for the next 30 years. That in itself sets Rust miles apart from most other language projects out there.

13 Likes

What are the "more elegant languages" you reference here?

I think he was thinking about the many modern programming languages which essentially cannot be implemented without a thick runtime (garbage collection, run-time compilation...).

Such a runtime allows programming languages to be much easier to use in common application development use cases, but it is also a large price to pay in terms of memory requirements, achievable performance, and predictability. This is why languages which such runtime requirements are typically not used in scenarios such as low-level OS layers or foundational numeric libraries.

2 Likes

What are the “more elegant languages” you reference here?

Like Python. If Python could do concurrency, run on the metal, and at C-like speeds, I would never need to use Rust. But the very constructs that allow those things, makes Rust significantly harder to learn than Python. The way Rust implements these things is admirable and makes any attempt at a comparison to Python rather silly since Python lacks it completely.

Several software devs have written negative things about Rust on the web because they clearly have never used C for DSP and embedded, or needed to code for high-performance computing.

3 Likes

Thanks for clarifying. I agree that comparing apples to apples is important. I'm having a hard time even finding a list of AOT compiled languages without garbage collection to use as a comparison tool.

Debating technical merits (or demerits) is fine, but I noticed a theme in some of your posts, which basically goes like this: "The Rust team are a bunch of grizzled old C++ hackers who've never used an actual modern language, so therefore Rust has failed to incorporate the awesome parts of modern languages, so therefore Rust is basically just C++ 2.0 (and therefore has bad APIs)."

However, that's not true. Rust was originally far less like C++ than it is today. The Rust language team are deeply experienced with many languages (including some Haskell experts!).

The Rust language team are very smart, and generally know what they're doing. They are constantly analyzing what other languages do (either to steal good ideas, or to avoid mistakes!) Yes, there are some non-ideal APIs in Rust, and yes there are missing features, but I don't think it's fair to characterize the Rust team as being short-sighted C++ fans who design bad APIs.

There's many different perspectives and philosophies that go into Rust's design. Everybody has different parts of Rust that they dislike, because people have different perspectives and priorities. That doesn't mean that Rust's design is objectively flawed.

Most of the designs in Rust have good reasons for why they are that way. Most of the time it's because of memory safety or performance (including in deeply constrained embedded environments), something that modern languages rarely need to worry about.

Sometimes it's because people need an API now, so they design a low-level API which solves the immediate problem, with the intention to add in a higher-level API later. That is not bad API design, it is not wrong. In many cases it is correct: layered APIs are important for flexibility and performance. Being able to precisely state the exact behaviors and intentions in the code is a good thing. Generalization and flexibility don't automatically lead to better APIs (though of course sometimes it does!).


Another theme in your posts is the idea that "this should be in the standard library, the fact that it's in a crate is a massive failure of the language". This is once again a matter of perspective and priorities. Language designers have generally come to the conclusion that thick standard libraries are not a good idea, and having a strong third-party package system is superior.

There's many reasons for this:

  1. Each crate can be versioned separately (unlike the standard library which is monolithic).

  2. Having a thin stdlib means that users only download the things that they actually use.

  3. Breaking changes in crates are much more palatable than breaking changes in the stdlib.

    If a bad API gets put into the stdlib, it's stuck there basically forever. Whereas crates are more nimble and can actually change and improve.

  4. It's possible to have multiple incompatible crates being used at the same time (e.g. foo version 1.0.0 and foo version 2.0.0), whereas that's not possible with the stdlib (at least not with Rust's design).

  5. Putting the burden of creating APIs on the community means that the Rust language team can spend more time improving the design of Rust itself (e.g. impl Trait, async/await, NLL, better tooling, etc.)

This is a pretty well-known result in the Python community, where they have the expression "libraries go into stdlib to die". There's been many instances where the Python stdlib added in some reasonable code (such as for HTTP servers), but in the end it's rarely used, instead Python users use third-party packages instead (because they are superior to the stdlib).

This has even happened in Haskell! It's commonly acknowledged that Haskell should have used the Text type rather than the String type, but this is basically impossible to change: the stdlib (and by extension a large amount of Haskell code) relies upon the String type.

It's often very hard to predict in advance which APIs are good, and which APIs seem good now but will become bad later. So rather than having a large stdlib (which is known to be bad), Rust instead takes the opinion that the stdlib should be as thin as possible, with as much functionality as possible put into crates.

One of the benefits of the stdlib is that it's easily accessible. However, that's negated because using a crate is only slightly harder than using the stdlib:

  1. You have to find the crate (which is easy thanks to crates.io)

  2. You have to add a single line to your Cargo.toml

  3. You have to put an extern at the top of your lib.rs or main.rs (this requirement will be removed with the Rust 2018 module system)

Another benefit of the stdlib is that it's stable and reproducible. However, the crates are version-locked, so they are also stable and reproducible, thus negating another benefit of the stdlib.

Another benefit of the stdlib is that it has a nice documentation system. However, third-party crates use the same doc system as the stdlib. It's really nice that the docs for crates are standardized, which makes them feel a lot more "built-in" than in some other languages. And unlike in some other languages, Rust has a strong culture of crate authors actually writing docs in the first place.

So, these are the only downsides to third-party crates:

  1. They might be less maintained.

  2. They might be more poorly written (assuming the Rust language team are better programmers than the average crate author).

  3. There might be multiple incompatible crates that accomplish the same task.

There's not much we can do about point 1, aside from encouraging more maintainers. Though popular crates tend to be pretty well maintained regardless.

As for point 2, I don't think it's a problem in practice. It's true that there are some poorly written crates, but at least the popular crates tend to be high-quality.

As for point 3, it is a legitimate concern, however in practice it works out well. In general what happens is that people publish a lot of different crates, and eventually one of them emerges as the "winner" and everybody gravitates towards it. In other words, certain crates become "blessed" by the community, and thus become just as pervasive as the stdlib. You can see that with tokio, futures, serde, nom, etc.

Yes it's more organic and messy than the stdlib, but that's not a bad thing: when people are experimenting with different ideas and implementations a bit of mess is unavoidable. Trying to prevent that messiness just creates stagnation and results in bad APIs / implementations.

I think it would be a good idea to have a website somewhere that lists all of the "blessed" crates, to make it easier for Rust beginners to acclimate themself to the Rust ecosystem: right now a lot of that knowledge is locked within people's minds.

18 Likes