Rust beginner notes & questions

Yes, I agree, as I look more into mio.

Yes, it is.

I think it might be advantageous to end/close this thread and spawn off a new thread to do the following:

  • solidify the requirements
  • survey the existing crates.io efforts and how they do/do not meet the requirements
  • analyze C# PIpes API and Java NIO and perhaps others for ideas/inspiration and/or problems to avoid
  • spin up a repo to begin the Trait/Interface designs and make decisions about how this should all interoperate etc. (as I mentioned above).

EDIT: Unfortunately, today, I need to keep focused on other work, so, I'll try to get back to this when I can.

EDIT: I've started a new thread here (Towards a more perfect RustIO) to provide a place to discuss the "Rust IO" issue. Further discussion on this thread should be be limited to other concerns (if any).

4 Likes

Idea: a Read-like trait that returns an owned buffer (like a buffer from the bytes crate). This sounds even more generic than BufRead / Read2 to me, because it means that use cases where the buffer must be owned by the caller for later usage are handled. In exchange, it makes the user feel less 0-cost, because there'll be an additional layer of refcounting (but in exchange there'll be true 0-cost available for the additional use cases like the kernel ring buffer).

FWIW, this is the approach taken for tokio's Streams: passing around owned buffers of the data.

1 Like

(also, I'm being told if this can't make it into libstd, which is likely until there is at least one widely-used implementation, I guess, then it could maybe make it into mio)

As promised, I've spun up a new thread for further discussion on the IO issues here (Towards a more perfect RustIO) to provide a place to discuss the "Rust IO" issue. Further discussion on this thread should be be limited to other concerns (if any).

2 Likes

I request that you be a bit more careful in your words.
I now understand that you mean "sympathy horrified": that he had to implement so many "wheels" himself, but my first three readings had me sputtering that you dare insult one of the most advanced, stable and high-quality Rust codebases out there.

@burntsushi has put a tremendous amount of thinking into his work, and has has gotten literally world-record speeds out of it. There is a reason that Visual Studio Code adopted it to power their search.

All your criticism seem to boil down to "why is this not in std?!?", and Rust's answer is: "so that brilliant people like @burntsushi can experiment inside crates, find the perfect API, and then we'll take it into std" (caveat: better yet keep it as an easily included crate. After all, "std is where libraries go to die_" in other languages).

The reason these crates are not there yet is because we're still building them. Rust's advancement stage might be at "level 4", age-wise it is still around java 1.1 levels.. there's only so much foundation you can build in three years with a mostly volunteer community.

I'm now taking a breather for a while, I find that I am physically angry from reading this..


Updated to add, I see that in the past 24 hours (post I hadn't gotten to yet when writing the above), this thread has taken a tremendously productive turn. I'm happily surprised! Thank you @peter_bertok and @gbutler for turning this heated discussion into a productive direction! : Heart:

14 Likes

Thanks for your flattering words! I know you didn't mean this, but a small note of clarity: no code should be held sacred or unassailable. Especially code that others gravitate towards learning from. There are parts of ripgrep I'm proud of and parts that I'm not so proud of. Small iterative improvement is the name of the game. :slight_smile:

And yes, I am glad this conversation took a productive turn. It was frustrating at times!

13 Likes

Ok, I’ll bite - which parts are you not proud of? :slight_smile:

Humor aside, I do think it’s valuable to have Rust codebases that one can point people at when they ask “can you show me an example of good/canonical/modern/well-structured/etc Rust code”. Ripgrep would be one candidate that certainly crosses my mind.

3 Likes

Quite a bit!

  • It is hard to reason about the interaction between flags. There are a lot of flags, and every time a new one is added, it's almost guaranteed that there is some interaction that wasn't anticipated.
  • This one came up in the conversation above: there are two different implementations of search in ripgrep, where one operates over buffers and one operates over a single big slice. The latter doesn't have some features implemented, such as context handling. So one consequence of this is that any time you request contexts, ripgrep will never be able to use memory maps. This is a failing on my part to write a more generic search routine and this is fixed in my in-progress work on libripgrep.
  • While the UTF-16 transcoder wasn't a lot of work on my part, I really think it should live in a separate crate since it is generically useful.
  • The parallel directory iterator API is baroque.
  • The printer has become very complex. This is partially due to the fact that this is primarily where all the interactions between flags are resolved. I'm working on a rewrite of the printer in libripgrep, and there are some marginal improvements, but there is still case analysis strewn about everywhere that I know is going to make the code hard to read. I don't have any good ideas here unfortunately.
  • ripgrep's src/main.rs is a bit too busy for my taste. There's a lot going on and very little meaningful abstraction happening.
  • The ignore crate's directory traversal code is quite complex and also very busy.
4 Likes

Forgive my ignorance, but is this mostly a result of some flags being basic primitives (ie bool, usize, etc)? I’m sure you’ve thought this through, but I’ll ask anyway: is there room to turn them into higher level types and attach some behavior to them that they can apply to the output? Or would that create its own abstraction spaghetti?

Is there a particular method/fn that demonstrates the issue you’re describing?

1 Like

Dunno. I mean, the technique you're describing is one I've employed many times over the years, so I'm not unfamiliar with it. :slight_smile: To me, the complexity arises in the permutation of all possible flags. There are probably competing (but perhaps not fundamental) pressures from the reduction of code duplication, abstraction soup (as you say) and performance (printing must be fast).

What I'm describing is emergent complexity in the interaction between different features. As such, there is no one bit of code I can just point to and say "yeah clean this up." It is spread out all throughout ripgrep. The printer just happens to be one particularly highly concentrated part of this: https://github.com/BurntSushi/ripgrep/blob/master/src/printer.rs --- But it's not the only the part. The part where CLI arguments are turned into a slightly higher level set of structured config knobs also exhibits this: https://github.com/BurntSushi/ripgrep/blob/master/src/args.rs

The reason why I'm not proud of this is partially because I can use my eyes to look at this code and say "that's inscrutable." But it's also partially because it has empirical grounding. There are bugs and/or feature requests that exist due to the unexpected interaction between flags. It would take a while for me to collate those issues, but it does seem like a worthwhile dimension to track, so if I think of it, I will start tagging issues with a new label that indicates that I believe it's a consequence of this emergent complexity. Perhaps a solution will emerge once I have more data!

5 Likes

Just to add my 2 cents. It should be noted that the reason many are ecstatic about the Rust lang project is that it's the first real contender for taking the c/c++ king-of-the-hill crown. None of the other more "elegant" languages are in the same league as Rust, because, fundamentally, Rust targets the bare metal.

While many would like to see Rust used for web dev, there is a large need for the electrical engineering community to have a C/C++ replacement for DSP/embedded/IoT/smart toys. Thus, while Rust should be as elegant and easy to learn as possible, the dream is that Rust will power the compilers and interpreters of all those other 'elegant' languages for the next 30 years. That in itself sets Rust miles apart from most other language projects out there.

13 Likes

What are the "more elegant languages" you reference here?

I think he was thinking about the many modern programming languages which essentially cannot be implemented without a thick runtime (garbage collection, run-time compilation...).

Such a runtime allows programming languages to be much easier to use in common application development use cases, but it is also a large price to pay in terms of memory requirements, achievable performance, and predictability. This is why languages which such runtime requirements are typically not used in scenarios such as low-level OS layers or foundational numeric libraries.

2 Likes

What are the “more elegant languages” you reference here?

Like Python. If Python could do concurrency, run on the metal, and at C-like speeds, I would never need to use Rust. But the very constructs that allow those things, makes Rust significantly harder to learn than Python. The way Rust implements these things is admirable and makes any attempt at a comparison to Python rather silly since Python lacks it completely.

Several software devs have written negative things about Rust on the web because they clearly have never used C for DSP and embedded, or needed to code for high-performance computing.

3 Likes

Thanks for clarifying. I agree that comparing apples to apples is important. I'm having a hard time even finding a list of AOT compiled languages without garbage collection to use as a comparison tool.

Debating technical merits (or demerits) is fine, but I noticed a theme in some of your posts, which basically goes like this: "The Rust team are a bunch of grizzled old C++ hackers who've never used an actual modern language, so therefore Rust has failed to incorporate the awesome parts of modern languages, so therefore Rust is basically just C++ 2.0 (and therefore has bad APIs)."

However, that's not true. Rust was originally far less like C++ than it is today. The Rust language team are deeply experienced with many languages (including some Haskell experts!).

The Rust language team are very smart, and generally know what they're doing. They are constantly analyzing what other languages do (either to steal good ideas, or to avoid mistakes!) Yes, there are some non-ideal APIs in Rust, and yes there are missing features, but I don't think it's fair to characterize the Rust team as being short-sighted C++ fans who design bad APIs.

There's many different perspectives and philosophies that go into Rust's design. Everybody has different parts of Rust that they dislike, because people have different perspectives and priorities. That doesn't mean that Rust's design is objectively flawed.

Most of the designs in Rust have good reasons for why they are that way. Most of the time it's because of memory safety or performance (including in deeply constrained embedded environments), something that modern languages rarely need to worry about.

Sometimes it's because people need an API now, so they design a low-level API which solves the immediate problem, with the intention to add in a higher-level API later. That is not bad API design, it is not wrong. In many cases it is correct: layered APIs are important for flexibility and performance. Being able to precisely state the exact behaviors and intentions in the code is a good thing. Generalization and flexibility don't automatically lead to better APIs (though of course sometimes it does!).


Another theme in your posts is the idea that "this should be in the standard library, the fact that it's in a crate is a massive failure of the language". This is once again a matter of perspective and priorities. Language designers have generally come to the conclusion that thick standard libraries are not a good idea, and having a strong third-party package system is superior.

There's many reasons for this:

  1. Each crate can be versioned separately (unlike the standard library which is monolithic).

  2. Having a thin stdlib means that users only download the things that they actually use.

  3. Breaking changes in crates are much more palatable than breaking changes in the stdlib.

    If a bad API gets put into the stdlib, it's stuck there basically forever. Whereas crates are more nimble and can actually change and improve.

  4. It's possible to have multiple incompatible crates being used at the same time (e.g. foo version 1.0.0 and foo version 2.0.0), whereas that's not possible with the stdlib (at least not with Rust's design).

  5. Putting the burden of creating APIs on the community means that the Rust language team can spend more time improving the design of Rust itself (e.g. impl Trait, async/await, NLL, better tooling, etc.)

This is a pretty well-known result in the Python community, where they have the expression "libraries go into stdlib to die". There's been many instances where the Python stdlib added in some reasonable code (such as for HTTP servers), but in the end it's rarely used, instead Python users use third-party packages instead (because they are superior to the stdlib).

This has even happened in Haskell! It's commonly acknowledged that Haskell should have used the Text type rather than the String type, but this is basically impossible to change: the stdlib (and by extension a large amount of Haskell code) relies upon the String type.

It's often very hard to predict in advance which APIs are good, and which APIs seem good now but will become bad later. So rather than having a large stdlib (which is known to be bad), Rust instead takes the opinion that the stdlib should be as thin as possible, with as much functionality as possible put into crates.

One of the benefits of the stdlib is that it's easily accessible. However, that's negated because using a crate is only slightly harder than using the stdlib:

  1. You have to find the crate (which is easy thanks to crates.io)

  2. You have to add a single line to your Cargo.toml

  3. You have to put an extern at the top of your lib.rs or main.rs (this requirement will be removed with the Rust 2018 module system)

Another benefit of the stdlib is that it's stable and reproducible. However, the crates are version-locked, so they are also stable and reproducible, thus negating another benefit of the stdlib.

Another benefit of the stdlib is that it has a nice documentation system. However, third-party crates use the same doc system as the stdlib. It's really nice that the docs for crates are standardized, which makes them feel a lot more "built-in" than in some other languages. And unlike in some other languages, Rust has a strong culture of crate authors actually writing docs in the first place.

So, these are the only downsides to third-party crates:

  1. They might be less maintained.

  2. They might be more poorly written (assuming the Rust language team are better programmers than the average crate author).

  3. There might be multiple incompatible crates that accomplish the same task.

There's not much we can do about point 1, aside from encouraging more maintainers. Though popular crates tend to be pretty well maintained regardless.

As for point 2, I don't think it's a problem in practice. It's true that there are some poorly written crates, but at least the popular crates tend to be high-quality.

As for point 3, it is a legitimate concern, however in practice it works out well. In general what happens is that people publish a lot of different crates, and eventually one of them emerges as the "winner" and everybody gravitates towards it. In other words, certain crates become "blessed" by the community, and thus become just as pervasive as the stdlib. You can see that with tokio, futures, serde, nom, etc.

Yes it's more organic and messy than the stdlib, but that's not a bad thing: when people are experimenting with different ideas and implementations a bit of mess is unavoidable. Trying to prevent that messiness just creates stagnation and results in bad APIs / implementations.

I think it would be a good idea to have a website somewhere that lists all of the "blessed" crates, to make it easier for Rust beginners to acclimate themself to the Rust ecosystem: right now a lot of that knowledge is locked within people's minds.

18 Likes

As if everything you said wasn't enough already, I think it's worth saying that these three points are often just as true of standard libraries as they are of third-party libraries.

For 1 and 2, the people that control what goes into a standard library are usually also spending a lot of their time on the core language, while a 3rd party library maintainer is much more likely to be spending all their coding time maintaining just that library. Like how serde replaced rustc_serialize.

For 3, I'm sure most of us can point to some silly duplication in some standard library we've worked with. An especially easy target would be C++ having both the printf family and iostreams.

Really, in C++ there are huge chunks of std that exist only because C++ has no package management ecosystem so all third party code is instantly a non-starter for many users. For instance, C++ standardized <random>, but Rust will probably never pull rand into std because there's simply no reason to. Oh, and that's also another example of how Rust is not "just C++2.0", but fundamentally different and better for it.

7 Likes

That's a long post, I'll try to it justice by replying to as many of your points as time allows:

Recently I had a funny experience, where I spotted someone writing some PowerShell from across the room, but too far away to actually read their code. I could tell from the colour scheme that it was the PowerShell ISE screen, but nothing else specific. I could tell -- from that distance -- that they were a former VBScripter writing scripts exactly like they used to before, except in a new language. They hadn't embraced any of the PS "way of doing things", they were just writing their old VBScripts in the new syntax. On closer inspection, my suspicions were verified, the script they were writing looked like it had been mechanically translated line-by-line from a VB script, despite being new code dealing with Office 365 automation.

Once you learn enough languages, and have sufficient decades of experience under your belt, you can get very good at spotting the "accents" of other developers. My main complaint with Rust is that most of its core developers have a strong "C++ accent". In my mind, Rust 1.x isn't Rust enough. You can argue all you want that I'm wrong about this, but to me it looks like C++ 2.0 from a mile away.

Yes! This is why I don't like the current String and str types, especially that the latter is a built-in type like i32 despite being rather complex under the hood. I don't like that UTF-8 as a memory layout was basically forced on everybody, when UTF-16 is still all too common with Java, C#, and Win32 interop -- none of which is going away any time soon. This is the same mistake the Haskell guys made by forcing a specific string implementation on people when it was clearly wrong. Strings are complex enough to warrant an interface.

Rust should have used a set of string traits only, and set things up from the get-go to be 100% smooth across a range of string encodings, including special cases such as &[char], Iterator<char>, compressed strings , UTF-16, Win32 Latin1, etc...

This doesn't mean that the standard library has to implement every possible encoding, just that it should have prepared the ground for libraries or std v2.0 to fill the gaps. Right now, things are... a mess. In an earlier post I highlighted issues such as a.foo(b) working but b.foo(a) failing to compile. This is the tip of the design iceberg. It may look small, but it says a lot about what's going on under the water. Crates won't fix this.

Someone mentioned that this slim std-lib decision was a necessary evil to ship Rust 1.0, and this limitation has been incorrectly embraced as a virtue by many people. A thin standard library is one of the reasons I abandoned C++. My productivity in other languages is at a minimum 5x better than in C++, and most of that is due to the richer libraries available out of the box.

Unless you mean C or C++, documentation isn't exactly uncommon... Even there, documentation is often quite good, I used to use Doxygen back in 2001 or so along with many other people. Standardised "comment-based docs" are over 20 years old now, both Java and C# 1.0 had this.

I like that you play devil's advocate to your own post, that's very scientific of you. 8)

However, to add the "peer review" aspect to that scientific bent:

  • Maintenance: In my experience with all third-party module systems such as cargo, or npm, or whatever is not so much that the maintenance is hit & miss, but that this is not self-evident. How do you know if a module is maintained or not? How do you know that if there is a tiny but critical problem that you can get pull request merged without actually trying? How do you know that despite a 10-year history the dev has recently gotten a new job and has put tools down? This is bad enough if you pull in a large dependency such as actix or diesel, but what about all the transitive dependencies? Are they all high-quality? Production-ready? Safe? Maintained? Future-proof? Consistent with other transitive dependencies you'll be pulling in indirectly via other crates?

  • Poor Quality: This is a much bigger deal than it sounds. For example, several people have pointed out that short of reading through all code of all transitive dependencies, you have no idea if there is unsafe or panic somewhere in an innocent-looking library that will crash your application process. Neither rust nor cargo properly handle this. Oh sure, there's the unsafe keyword, but this isn't "bubbled up", like with C# where you have to mark the entire library as "compile with /unsafe". I just discovered that the mmap crate doesn't support 32-bit! Err... what? Compared to random landmines like that, I know that Rust's std library has been tested with 32-bit. I know that Microsoft has tested C# with 32-bit. How do I know that every transitive dependency will work on 32-bit platforms if 99% of Rust developers are using 64-bit operating systems to develop their crates?

  • Incompatibility: This is just goint to get worse. Even some trivial modules are pulling in a dozen or more dependencies, and those in turn are pulling in more, which in turn... ugh. The NPM fiascos have shown that this just leads to madness at scale.

I have a feeling that module systems like cargo.io simply don't scale as currently designed. There's a honeymoon period that just doesn't last once the real-world kicks in. Based on just observing the mess that is NPM, the following features at a minimum really ought to have been included in Cargo from day #1, but apparently stability and safety just aren't priorities right now for a language aiming squarely at web developers and systems programmers:

  1. Code signing or some sort of method for securely verifying the origin of code.
  2. Some clear -- or better yet -- enforced way to verify that what's on GitHub matches what's on cargo.io.
  3. Some sort of namespace system to avoid typosquatting, and random low-quality crates permanently taking every common dictionary word. There really ought to be an "official" rust library prefix, at the very least. E.g.: rust-std/uuid or instead of just uuid, which could be anything written by anyone.
  4. Some method to handle renames of crates, such as using GUIDs as the real crate identifier, and the display name used only during the initial search.
  5. Compatibility flags on all crates, such as the required rustc version, std or core compatible, 32-bit or 64-bit, x86 or ARM, SSE2 or AVX, etc, etc...
  6. The popularity of a crate (number of downloads, etc...) so you can judge how many people use it.
  7. Whether it is prerelease or not, including all transitive dependencies.
  8. Automated builds or unit tests vs various rustc versions, verifying compatibility.

Some people in this thread mentioned that both Java and C# are going down the same path with things like NuGet. In my opinion, this is terrible for the future of those languages. The quality has taken a massive nosedive. I never had to worry about the compiler throwing random internal errors with C# before, but I do now. I recently had to try and work out why a transitive dependency was causing dotnet core to fail, and I basically couldn't work it out after weeks of research. Other people hit the same dead end.

That! That is the crux of the problem! Why is this extra thing necessary? Why doesn't crates.io already have this as a built-in feature? Why can't creates have some sort of "official seal of approval"? Why can't we determine if it's safe to include a crate without having to trawl through web pages manually?

How do you know that some 3 year old block of code you pick up and compile hasn't suddenly been p0wned via some transitive dependency?

Is this the future of Rust? I suspect so..

4 Likes

I can't deny that the current design of Rust is very inspired by C++, and there are certainly areas where (as you say) I wish Rust went further than it currently does, though I think "C++ 2.0" doesn't quite do Rust justice.

On the other hand, UTF-8 is very common on Unixes and with network code, so choosing UTF-16 isn't correct either.

You are right that it would be great if Rust had a more generic system for handling strings (including different encodings).

Isn't that more of a failing of C++, rather than a failing of package management in general? As long as Rust has a rich set of blessed libraries, isn't that equivalent to having a rich stdlib?

In my experience the quality of Rust docs are quite a lot better than other languages I have used (including Python, F#, C#, and JavaScript).

However, I wasn't saying that Rust docs are good and other languages aren't. I was saying that because Rust docs are good, that makes crates more similar to the stdlib. Of course that also applies to other languages that have good docs.

My goal has always been to make good arguments, not to win arguments, so I consider that a compliment.

Those are all good points, which I think can be solved by the community: crates.io listing actively maintained crates, putting more statistics on crates.io, using more badges, etc. There's actually been some discussion about that recently. I think there's definitely a lot that can be improved!

I'm not sure if that's a good counter-argument. In any system with third-party packages you can find bad code. My point was that the popular crates should be roughly the same quality as the stdlib.

As for some of your specific points: I think a lot of it can be solved with lints, and tools that can analyze an entire cargo dependency tree to display useful information (such as the amount of unsafe code, etc.). There's also work being done on a "portability lint" that should improve the situation that you mentioned with 32-bit vs 64-bit (among other things).

I'm 100% with you about the suckage of npm, though I would like to point out that Rust's design, Cargo's design, and the overall Rust culture and ecosystem isn't the same as npm, so I'm cautiously optimistic.

That would have dramatically delayed the release of Rust 1.0, so I think it's unrealistic to expect them to have been there from the start. Once things settle down and people have more bandwidth available, there's certainly the possibility of improving things!

Let me address some of your specific suggestions:

I assume you mean some sort of hash/checksum? That sounds perfectly reasonable to me.

Why does this matter? When you publish a crate, the entire source code gets uploaded to crates.io, so there's no connection at all to GitHub.

I strongly agree with this. It should really work like GitHub: user-name/package-name.

Doesn't a namespace system basically remove the need for GUIDs?

I agree with this, though I suspect it'll be quite difficult to do it right.

This is already implemented and working on crates.io. In fact, it even displays a graph showing the number of downloads over time.

I haven't tested it, but I believe it's possible to use semver for that, e.g. 3.0.0-alpha

As for seeing whether transitive crates are pre-release or not, that sounds like a good idea (and not hard to add).

It's not automatic, but crater does get run pretty regularly.

Of course individual crates can use continuous integration (like Travis), and many do.

I imagine it's just because nobody's done it yet. You have to keep in mind that Rust is still a small community, so we have to prioritize our time and energy. So a lot of "nice-to-have" features end up being passed over in favor of "we-need-it-now" features.

For applications there's the Cargo.lock that prevents that, though I agree that it's very scary for new applications.

Just FYI, Nix solves pretty much every problem you listed, it's a super fantastic package manager, and they do have support for Rust. Unfortunately Nix doesn't have great Windows support right now.

So, given that Nix was able to solve those problems, I think Cargo can solve them too, it will just take time and effort (which is in short supply right now, because of the imminent release of Rust 2018).

6 Likes

Conversely, my argument is that a rich set of blessed libraries is the standard library. Externalising bits and pieces into a package manager just makes it hard to see where the officially anointed libraries end and the heathen filth begins. 8)

I'm having exactly this experience with C# right now. For example: NuGet, just like cargo.io, fails to properly include the full set of "dependency" flags actually required in complex (real!) scenarios. Nice, easy versioning of "3.0", "3.5", "4.0", etc... has gone out the window, and what I'm left with is a mess of incompatibilities that I shouldn't have to deal with.

In my particular case, something like this* happened:

  • I'm writing a C# PowerShell module, which is basically just a DLL library. This part is easy.
  • This library has a NuGet dependency that in turn has a transitive dependency on a .NET Core module that is also a NuGet package. This is generally not optional now, because the dotnet core standard library is modular. So importing a dependency like System.Data.SqlClient will import System.Data behind the scenes. They're independently versioned, just like cargo packages.
  • The linkage is dynamic and done at runtime, so neither NuGet nor the dotnet compiler have any idea what this dependency is exactly, because the pwsh runtime pulls in some specific version of the standard library modules.To work around the issues this causes, there is a wonderfully opaque set of compatibility shims all over the place.
  • If I stay on version 4.0 of the package I'm using, it doesn't work, because of a bug in that version of the library.
  • If I "upgrade" to 4.5 everything breaks on Linux because some legacy compatibility shim was removed. This shim is still required by pwsh on Linux, but not Windows. I have no control over this, it's a component of dotnet core pulled in by the pwsh executable, not me, but it's incompatible with my DLL.

None of this is documented, reflected in NuGet.org, managed by nuget.exe, or in any reasonable way discoverable. This is 100% code written by one organisation, Microsoft. Half of it was written by one team (pwsh) and half it by just one other team (dotnet core). They're likely working in the same building and yet there are already entire categories of scenarios where everything just blows up in my face unpredictably and there is nothing I can meaningfully do on my end to fix it. I literally tried every combination of csproj settings through brute force to see if it could be made to work. It can't.

I did a lot of research, and I worked out that basically the pwsh team had likely never sat down to write a "third-party" cmdlet module in C# that uses a NuGet dependency. They've written a bunch of C# modules, but it's all part of pwsh, and hence versioned in sync with it, so they've just never been exposed to the mess that they've created for everyone else, the 99.9% of the developers who will actually have to use this stuff.

I just have a feeling (perhaps unjustified), that a lot of Rust development is similar. It works in a controlled environment, but I doubt it can handle the combinatorial explosion that is already making NPM a nightmare for developers. The more popular cargo gets, the worse it will get.

Let me paint you two scenarios in analogy with my pwsh experience:

Scenario 1, the Servo team (or similar):

  • A medium-to-large team with a long-term project. Timelines measured in years.
  • A lot of overlap with the Rust core language team and the developers of the key crates.io packages that make up the "semi-standard" library of blessed modules. These guys likely often meet in person, work in the same building, or correspond on the internet on a regular basis. There is trust built up over years.
  • Dependencies change slowly, and they have months to patch up any small inconsistencies.
  • They directly control most of their dependencies, including transitive dependencies. The packages were written for Servo, or by a Servo team member, or someone in close collaboration. Something like 75-90% of the code is under their "control". They know exactly what they're importing and where it comes from.
  • The code is open source. It's not "sold", and even if it is, there's a disclaimer that says that they're not liable for anything. There's no warranty.

For scenarios like above, I am in no way disputing that the Rust development environment "just works". It would be a big step up from C++, provide a lot of flexibility, and generally enhance productivity. This is great. Game developers would be in a similar boat, and I'm sure there's many more examples.

Scenario 2, an enterprise tool:

  • Lone developer or small team, some of whom... are not great developers. You can't control everything, and the people you work with make mistakes or are just a bit sloppy. Maybe good, but overworked.
  • The goal is to plumb together a bunch of libraries. Feed XML containing JSON into a database. Make it authenticate with LDAP and SAML. Talk to a legacy Java app. Import from a mainframe. Etc...
  • Your dependencies are published by corporations, not "Bob" down the corridor. You have zero control over these packages. Pull requests are silently ignored, assuming it's even open-source to begin with.
  • You're stuck on an old compiler because of the above.
  • Timelines measured in weeks. If something breaks, you are screwed. Deadlines make wooshing noises and emergency meetings are scheduled to recur daily by project managers who don't care about "package incompatibilities" and other meaningless technical talk.
  • You can't spend significant time researching the pedigree of every dependency, there's hundreds of transitive modules being pulled in, looking through them all would eat up your entire dev time budget before you wrote a single line of code.
  • Even if you miraculously do check everything, maintenance is done by a different team on a different continent. They'll blindly pull in the latest "updates". You have no control over this either, management six levels above you signed an outsourcing contract for BAU support.
  • This is going to be processing sensitive data worth millions. If it's insecure or crashes, your managers will avoid all responsibility and blame you. You're lucky if you're only fired. You'll likely avoid jail, but lawyers will probably get involved.

Now, in this scenario, Rust and Cargo are... not great. If this happened in scenario #2, that poor solo enterprise developer would not be a happy person:

A dev in scenario #1 is much less likely to be affected, and could just laugh something like this off. A dev in scenario #2 would just avoid Rust if he's got any brains. I certainly would not use it as it stands, because there's virtually zero protection from the kind of vulnerabilities that were not just predictable, but predicted, and have occurred for real. Why would I risk it? For what? Twice the runtime performance? Pfft... I could just request a server 2x as big and not risk my job and my career.

So imagine if I was a bad actor like that guy who put malware into the eslint NPM package. Just pretend that I've been contributing to cargo.io under a pseudonym (real names and code signing not required, remember!). There's a popular package that I've uploaded years ago with a cutesy name that lots of people use.

I'm going to inject malware into it next time I fix a critical bug. This malware will steal github.org and cargo.io credentials, which I will use to further distribute malware into as many more packages as I can get my hands on.

Now what? What are you going to do about it?

The clock is ticking. Seriously. In a week or so, one of your dependencies is turning evil. You don't know which one. You need to update it along with a bunch of others. Tick... tock... tick... tock...

*) I have no idea what's going on exactly, the entire thing is a ludicrously complex black box as far as I'm concerned. Other people have reported the same issue, and it's still open. Nobody from the dotnet core team has any clue what to do.

7 Likes