How does crates.io differ from npm

Running of the build.rs sounds scary, and sandboxing sounds sweet, but in the end I think it's futile and/or unfair to think of build sandboxing as a solution to malicious code.

A malicious crate — by design necessity — will be allowed to escape the sandbox. It will be included as executable code in your app. I think it's unrealistic to expect that the developer will never run their own app unsandboxed, but even if Rust+Cargo can be made into a remotely operated fortress of sandboxing, the end result will likely be given to other users unsandboxed.

So even a perfect sandbox is still a total dick move towards end users and will spread malware far and wide (including developers who use other Rust developers' products).

The anti-malware barrier has to be moved to a step earlier before any *.rs file is run anywhere.

3 Likes

Maybe it would make more sense to have crates.io namespaced by ownership.

If alice transfers "widget" to mallory then the crate name should change from alice/widget:0.0.1 to mallory/widget:0.0.2. That way downstream consumers would have to make a conscious decision to take sources from a new party. This way it's also possible to have multiple forks of a crate on crates and some cargo plugin could suggest forks if a project is unmaintained.

5 Likes

Having (optional) not so ideal protection is much better than having none. Also don't forget about increasing awareness, existence of sandboxing already screams about potentially malicious code.

In some sense sandboxing is similar to borrow checker, we still have soundness and memory-safety issues, but impact is order(s) of magnitude smaller than without it. Yes, it's discussable whether investment into it is worth the trouble compared to the protection it will provide (reminds C/C++ folk arguments, doesn't it?), but I strongly disagree with your argument: "sandoxing has limitations, thus we shouldn't consider it".

Stuff will happen, there is no ideal defense, does not matter before or after build.rs, the question is how easy it will be to exploit an opening for an attacker and how much inconvenience will protection create for users?

How the hell is it a dick move??? If I use protection while building something, because in the process it may blow up, and give you the end result for free, why would I be a dick in this situation? As a grown-up you should understand that stuff you got from the stranger can blow up, and it's up to you if you'll use protection when using it or will rely on this stranger who in turn relies on dozen of other strangers, with a very fragile chain of trust.

And why do you think that if build process will be sandboxed, the same can't be done for cargo tools? Protecting developers is more important than protecting users (not saying we shouldn't care about the latter, it's about priorities), as otherwise in the worst case scenario we could get an avalanche-like spread of malware across ecosystem. (though reliance on manual cargo update will reduce spread rate a lot)

Instead of namespacing the crates, you could whitelist the trusted owners when you take a dependency on a package.

Then if a new owner is added and publishes a new version you would need to acknowledge the change in ownership.

2 Likes

All this talk of sandboxing has me worried. At what granularity are we talking? The application? The module? As has been noted, sandboxes are notoriously leaky.

There is a discipline called "object capabilities" that lets you enforce least privilege at the granularity of the individual instance. You protect yourself by using a suspect crate only in places that have very few privileges. For example, the Darpa Browser shows that you can safely use a malicious rendering engine in your browser.

You can turn many memory safe languages into ocap dialects by subsetting. It's been done for Java, OCaml, and JavaScript that I know of. I haven't seen anything that would prevent it being done for Rust, but I'm not expert enough to be sure.

If you're interested in learning more, https://people.eecs.berkeley.edu/~daw/papers/joe-e-ndss10.pdf shows how it was done for Java and contains references to get you started.

1 Like

Sandboxing current workspace will be a good start. Capabilities are certainly a very interesting and promising topic, but implementing it and adopting across sufficient part of the ecosystem will be a huge undertaking.

Sandboxing of build.rs run time, but not malicious output from build.rs (such as static libraries produced by it) and malicious lib.rs from the crate is just a speed bump. If feels more secure than it really is. If feels like doing something, but in the end you're still willingly taking and executing arbitrary code supplied by the attacker.

It's still possible to inject malicious code by crates that don't even have build.rs at all. So you can secure build.rs execution all you want, but it doesn't matter, because it's not needed.

If you have a bucket with two holes in the bottom, and you perfectly close one hole, it's still a useless bucket.

Designing a system that can only protect developers, but leaves everyone else exposed is selfish and unfair. If you don't trust the crates you're using to the point you're afraid to run them on your machine, you shouldn't be shipping this stuff to your users.

4 Likes

One technical change that can and IMO must be made: cargo build (and things that do an implicit cargo build) must be made to never update the Cargo.lock file. If cargo build updates the lock file and then builds, then that means the developer has never approved the changes to the dependencies. That means the user hasn't (necessarily) even reviewed the dependencies and maybe isn't even aware that the dependencies have changed.

IIUC, the way to get this behavior is to to always use cargo build --locked and never plain cargo build. On top of this it is possible to build a secure workflow that includes properly reviewing dependencies. Conversely, without doing this it isn't possible to build a secure workflow unless the machine is air-gapped.

Accordingly, the --locked behavior should become the default and the current default behavior should be removed.

7 Likes

On the techincal side, crates-io is adding logging of who exactly published each crate. If this was exposed externally, it'd be feasible to write a tool that warns you when publisher of any of your dependencies has changed.

Then there's source replacement feature which you could use to replace crates-io with a registry mirror/subset that only contains crates that have been somehow reviewed/trusted, so even a full cargo update would pull only stuff you expect.

1 Like

Every defensive measure can be viewed as "just" a speed bump. It does not mean we shouldn't consider using them.

I was not talking just about sandboxing build.rs execution, but also about all cargo subcommands, so if you'll execute cargo run the final binary will be executed in a restricted environment (e.g. using AppArmor on Linux).

Protecting users from potentially malicious code is outside of the building tools scope. But protecting developers is not! Users will either protect themselves, will trust selected strangers and their diligence (as they do now), or will rely on reviewing infrastructure (described in my initial message), which I hope we will get one day.

Do you check the whole dependency tree for all your projects? Don't be ridiculous, it's not practical for most of the projects, especially if they are developed for fun. Many developers are simply ignorant about the problem and do not understand (or significantly underestimate) potential risks of the current model.

newpavlov

Sandboxing current workspace will be a good start. Capabilities are certainly a very interesting and promising topic, but implementing it and adopting across sufficient part of the ecosystem will be a huge undertaking.

Big, but perhaps not as huge an undertaking as you might think. Below is Marc Stiegler's response to my question of how much work was involved for other languages.

I should also point out that the Java effort included an Eclipse plugin that prevented your program from compiling if it didn't follow the rules. In all fairness, converting to capabilities is not going to be backwards compatible. There were no, non-trivial Java programs that compiled unchanged.

I of course did the entire OCaml ocap system, including taming of the libraries, in 2 weeks over the Christmas holidays. Unsurprisingly, OCaml's library was both well designed for modularity and was also small. No UI. I disallowed using the Marshalling module which was all about zapping bits using pointers. I tamed the file system and the tcp/ip, sockets, etc. networking modules. Can't remember what else there was, but I tamed everything in the standard OCaml library. Java was of course another matter entirely. Months, probably 6. Of course that was the first taming effort ever by anyone, but I don't think that was really a big part of the problem, it was mainly the boundless vastness of Java's libraries. by and large the Java libraries were also well modularized, though you could tell that certain libraries were written by idiots, like the library implementing the global mutable keyboard. I later looked at the C# library, another enormous collection of stuff, and much less well designed than the Java libraries, it would probably take a man-year.

Rust had Brendan involved. Hard to believe it isn't well designed, and probably the core lib is small. So it could be 2 weeks to do it as well, like OCaml.

Converting code that already uses the untamed stuff is of course an entirely different matter. Completely indeterminable from here. the important lesson from both OCaml and Java is that, for the typical programmer doing typical things, the changes in what he needs to do to work with a tamed library are surprisingly limited. The difficulty in upgrading the programmers will surely be greater than the difficulty in upgrading the libraries, but not because there is either too much to learn or too great a change, it's just that programmers, like other humans, hate having to learn anything different. Of course, rust is young enough, maybe it's still early adopters who don't mind learning a new trick.

And of course the other thing we proved with OCaml is, there doesn't have to be a performance hit. Emily, the ocap version, generated executables that were, for many applications, just as fast as raw OCaml. Not surprising in retrospect: the things you care most about taming are the things involving i/o, which are already so slow a little indirection in the code is quite invisible.

2 Likes

cargo-crev tool is kind-of-working and there are some people wanting to help. https://github.com/dpc/crev/issues/37

2 Likes

An instrumented sandbox could help both developers and users. "Instrumented" meaning you could run code and find out, say, what files were opened for reading or writing, what directories were scanned, what ports were opened — or what attempts were made to do those things.

Even an uninstrumented sandbox could be useful to users, by helping create a culture of not shipping until dependencies are reviewed by something like crev. With a sandbox developers can start testing the new version of dependency X, and then (perhaps, at least!) participating in the review process so their own project can release an update. And if the sandbox was instrumented, then it could help them do the review.

Of course, it would be possible to outwit any sandbox's instrumentation. But the more obstacles we put in the way of exploits, the more obvious they'll be to people reading the code.

3 Likes

POLA Would Have Prevented the Event-Stream Incident | by Kate Sills | Agoric | Medium is a blog post describing one way to protect against attacks like the one against npm. Perhaps it's possible to modify cargo to operate along the lines suggested in the post.

Most malware out there does one or more of these things:

  1. Read/modify files (ransomeware, virus, browser hijack, etc)
  2. Steal stuff by sending it somewhere (credentials, etc)
  3. Buffer overflow (or other memory hack) plus code injection to then do arbitrary stuff.

In plain code, the first two are easy to see: if there is any file or socket access, that code should be reviewed. Can this check be automated, in the sense that crates.io can flag packages that use file/socket API? Can the malicious code be hidden from such detection?

The third one requires unsafe + code injection. What does code injection look like (I've never seen such a thing)? Can it be auto-detected in the source?

Am I just too naive to think there is some technical solution here that'll provide more protection without a loss of fidelity?

Rust is not a sandbox language like JavaScript or Java applets. It never had, and never meant to have any isolation of untrusted code. Because of that it's full of "holes" by design.

For example, #[no_mangle] and #[link_name] can replace any symbol, including main or callbacks that run even before main. Here Rust trusts the system linker, which was never meant to have any security whatsoever.

Rust's strong point is interoperability with C and system libraries. Anything goes there. A crate may use #[link="c-lib"] and supply the C library itself. That's a feature.

Rust itself is supposed to be linkable with other programs (e.g. you can write an nginx module), which generally don't have the level of sandboxing needed to safely contain viruses running in native code.

6 Likes

The security barrier must be earlier. It will make everything easier to secure.

By analogy, it's like securing your own house. You can lock points of entry to your house, and it'll work fine for the entire house. But if you move the security barrier to wrong place because you assume burglars may freely roam inside your house, you'll end up with locks on your bathroom, padlocks on your drawers, gates on your bed, TV stored in a safe, and kitchen utensils chained to walls.

Here, your OS is your house. If you work with assumption that you let malware in, and running viruses is normal part of Rust development, you'll need extraordinary level of sandboxing paranoia similar to having things in your house chained to walls. It's going to be incredibly hard to secure everything, and it's going to be incredibly inconvenient to use the language that has to act at all times as if all your code was infected and dangerous.

4 Likes

Yes. So a Rust sandbox needs to be something like a virtual machine or chroot jail. The sandbox can help detect malware in a dependency. It can't prevent bad behaviour in production.

Smart malware will then start trying to detect when it's in a sandbox and behave itself until released in production; but no countermeasure is infallible.

When talking about sandboxing in rust, remember that it isn't that simple. If you want to effectively sandbox it, you have to mimic every target triple that the crate supports, every toolchain, and every feature flag that the crate declares.

There are also a lot of crates that go looking for pre-compiled C or C++ libraries in the existing environment - sandboxing would break this.

2 Likes