Regarding the Security / Safety of Libraries on Crates.io

That's a given - and I do believe that such mechanisms definitely should be put in place, but they rely on the intermediary package manager of the eco-system, not on the end user - which will still be forced to deal with the issue should any maliciousness slip through the cracks. Perhaps I'm a bit too hopeful about this, but giving the user of the crate the possibility to forcibly shut down any functionality from any crate that doesn't conform to his expectations is a much safer, albeit more complex, alternative.

I've probably done quite a terrible job explaining myself - because my suggestion was never about the concerns with unsafe functionality. Dealing with memory in a potentially unsound way has very little to do with the most common exploits that are continuously introduced in all sorts of packages to this day.

Are people going to rely on unsafe system calls to bake a crypto-miner into a package? Will they make raw system calls in order to place the right kind of keylogger in the right place, for it to get activated at right time? Or is it much likelier for them to rely on existing functions in the most popular crates to do whatever they'd like to do? This is what it boils down to.

To quote from the first link that @Heliozoa mentioned:

Summary
  • The threat model must assume that code can come from anybody, and libraries that accept code from unvetted strangers will outcompete libraries that only accept code after a rigorous vetting process (eg, I'm currently contributing to Raph Levien's druid; for all Raph knows, I'm a DGSE agent planted to introduce vulnerabilities in his code; Raph has done none of the thorough background checks that would be needed to prove this isn't the case; yet he's still taking my PRs).
  • The threat model must assume that people will be as lazy as they can afford to be when pulling dependencies. If people have a choice between a peer-reviewed dependency without the feature they need, and an unreviewed dependency with the feature, they will take the latter.
  • The threat model must assume that both attackers and legitimate developers can write code faster than people can review it.
  • The threat model must assume that some attackers will be sneaky and determined; if the ecosystem defends against supply chain attacks with heuristics, they will learn to game these heuristics. If the ecosystem only checks new crates for suspicious code, they will write non-suspicious code at first and add the actual vulnerability months later.

Introducing changes to crates.io doesn't deal with the issue. Removing all unsafe code is not practical. What remains is an explicit opt-in for specific functionality, allowed for specific crates. The clap shouldn't issue any TCP requests to any servers in India. A simple HTTP client shouldn't read any system files. All of these can (and should be - IMHO) enabled explicitly, as long as we have a common model of reference to work with. If Deno can do it, Rust can do it as well. As for the reliance on tokio and other packages, which are inherently unsafe - as long as there's an explicit opt-in for the specific functionality of tokio which is to be allowed for this particular crate, this isn't of any issue either.

Will this solve all security issues? Definitely not. But as long as each and every dependency is only allowed the bare minimum it can work with, the risk of accidental exploits is several orders of magnitude lower and orchestrating a complex attack is too much of a hassle for most people to bother with.

1 Like

If crates are allowed to use unsafe code they can just use inline assembly to do syscalls directly bypassing any security mechanisms of the Rust standard library. Or they can just call into libc like this... Or they can introduce all kinds of easily exploitable vulnerabilities.

So while this approach would make it a little bit harder to introduce exploits or vulnerabilities it would not offer any real protection.

2 Likes

True - but how likely is that to happen if you had to give any unknown crate an explicit permission to do these unsafe calls?

Forbidding unsafe code and requiring permission checks is uneffective for as long as rustc has soundness holes, as those allow you to write code using syscalls without the compiler knowing about this. There are currently 72 issues open labeld with I-unsound: Issues · rust-lang/rust · GitHub The oldest open soundness issue for example would allow transmuting an integer like the address of the syscall function in libc to a function pointer that can be safely called: Collisions in type_id · Issue #10389 · rust-lang/rust · GitHub As another example unsoundness relating to WF requirements on trait object types · Issue #44454 · rust-lang/rust · GitHub allows transmuting a reference with a limited lifetime to one with an 'static lifetime, thus allowing a use after free, which can be exploited to again call into libc. These issues are very unlikely to be hit accidentally, so rust does still provide a lot of safety over C/C++, but a malicious actor could easily exploit them.

8 Likes

I think the core problem here is that this still needs manual review of the bottom level that actually provides these capabilities. There's no way to automate review that when something is making syscalls that it's using only the syscalls associated with the correct permission. And of course as soon as something has fs access, you end up wanting more than that, since you probably didn't want it reading any file. Not to mention that safety in the rust sense isn't security. It's safe to delete all your files. It's safe to upload your bitcoin wallet to pastebin.

What languages have succeeded in an in-language security model that lasted? Java and C# both tried but gave up, as I recall. I feel like making it the OS's responsibility -- Solaris Zones or whatever -- is the way forward (especially for anything that needs to call C code). Or maybe running in a limited environment like a WASM VM.

6 Likes

Java's motivation for removing permissions: JEP 411: Deprecate the Security Manager for Removal

C#'s deprecation notice: Breaking change: Most code access security APIs are obsolete - .NET | Microsoft Docs

2 Likes

It's pretty obvious we need code signing of releases by multiple independent parties. Each signatory would declare its trust in the release, such as “discouraged” or “looks good to me” (meaning the signatory has watched the commits and not noticed any obvious attempt at hijacking) or “audited” (meaning the signatory has done extensive work to verify the security of the crate).

Don't get stuck on capabilities. There's not realistically going to be a capability system for general use of a systems programming language, let alone one that can give a program permissions you expect for programs written in such a language, such as “access the file system”, without letting any particular crate in the program exploit this permission.

Correct, this is exactly what these annotations would then be used for. The question is: how many bottom level functions would need to be analyzed for that happen?

I'm all for it - as long as you find enough independent parties to process all the incoming releases of all the crates that current and future users of the eco-system will likely rely on. So far, none of the other package managers have managed to find enough people. This solution works great for you, as the library user - it doesn't work for the independent parties, who need to spend additional time and mental effort, going through the releases for as much time as necessary without being paid for doing so.

Or are all the folks that are suggesting code reviews ready to pay for such work? If yes, that should be the first thing to discuss. Does the Rust Foundation have such an idea in mind? Or what's the plan?


It definitely looks like I'm going against the grain here at this point, so I'll close this off with the original train of thought I had in mind. Who knows, it might get picked up later by whomever wishes to go against everyone who'd rather keep things at their current status quo, as I have no energy left:

  • security is hard
  • making things secure is harder
  • the issue is exacerbated by the fact that people start caring about it at the last moment
  • expecting people to review their dependencies is at least a tiny bit naive
  • introducing additional checks to crates.io is useful
    • but can create additional hurdles and/or issues
      • for instance, less people might want to bother with 2FA
  • the biggest problems, it seems, would appear in following scenarios:
    • people import a dependency X in their project, expecting it to do Y
    • it might do what it promises to, but at the same time:
      • during the first run, it might inject the system with its own malware
      • in the long-run, it might get hijacked and rewritten to behave maliciously
      • if it starts doing something out of the ordinary, there's no way to know why
  • whomever will want to introduce bugs and exploits in the system, are likely going to:
    • rely on the existing eco-system and the standard library
      • rather than writing everything with raw system calls from scratch
    • will want to interact with the file system and/or the network to do what they want
    • disguise themselves as authors of helpful, related by functionality crates
  • currently, introducing any dependency means giving a full access of your machine
    • there is little to no distinction between the crate using unsafe vs safe code
    • there is no attempt to enforce any particular behavior on any crate
    • there is no way to know if tries to do something it shouldn't do
  • potential solution: explicit permissions at compile time
    • the most essential low-level blocks should be processed first
      • with clarification as to which kind of access they have and why
      • once clarified, annotations should be placed for the abstractions, built above
    • the crates, built on top of those, should be explicitly allowed to do X vs Y
      • if the crate is supposed to read files only, it shouldn't send web requests
      • if the crate sends web requests only, it shouldn't read your passwords files
      • if the crate isn't supposed to do anything unsafe, unsafe shouldn't be allowed
    • with this, the lowest level would only activate itself for the functionality needed at runtime
      • these would be the first to be audited and checked for any vulnerabilities
    • the crates, using lower-level functionality, would be limited to what they must do
      • without any way to access lower-level / other crate functionality, not related to them
    • with this, it gets much harder to build effective (for attackers) exploits, such as:
      • crypto miners
      • keyloggers
      • trojans
      • you name it

That was the logic behind my post. If it's entirely wrong, great. If not, we can discuss potential implementations of it, starting from something basic. If we'd rather keep everything the way it is, perhaps hoping that the issue will solve itself or some kind people come along and start reviewing all the crates that could potentially be harmful, well - all right then. We'll see how it plays out.

1 Like

If we went the permissions way, a rough set/hierarchy of fine-grained permissions could be something like this, with separate permissions for a few different things:

clicky
  • unsafe
    • extern-fn
    • ptr
      • arith
      • deref
      • ty-cast
    • unsafe-fn
    • asm
    • unsafe-impl-trait
    • allocator
    • pin
    • transmute (union as well)
  • fs
    • open
    • create
    • clone-handle
  • path
    • new
    • parent-dir
    • push
    • set-file
    • set-extension
    • from-str
  • io
    • read
    • write
    • stdin
    • stdout
    • stderr (panic will still use this though)
  • net
    • tcp-connect
    • udp-bind
    • clone-handle
1 Like

And even if you could fix all of rustc's soundness holes, or otherwise prevent user code from exploiting them, a soundness bug in any third-party library can also make it possible for malicious crates to trigger arbitrary behavior from safe code. This is why, in the presence of malicious code, it's not sufficient to annotate capabilities of unsafe library code. You need to prove that every line of unsafe code is 100% sound, or the capability system can be evaded. If you can't provide correctness proofs for all the unsafe code, then you can no longer rely at all on static checking of "high-level" or "purely safe" crates; you need to review them for malicious code as well, regardless of what your capability system says.

This is why we need to emphasize that while Rust's static analyses are very good at limiting accidental vulnerabilties in non-malicious code, they are not a sandbox system that can place meaningful limits on malicious code.

10 Likes

It's not about overworking people. It's about proportions. The more people using a library, the greater potential benefit of hijacking, the more eyes watching the code and the more people available to sign releases.

Did they try?

No. I'm suggesting that crates that are popular enough to be worth hijacking probably already have multiple people watching the code regularly. These people just have to start signing it.

Anyway, it's not unheard of that companies are payed to audit the code. In this case, the company that performs the audit would sign the code, which is a win-win making the audit more valuable to those who benefit from it and in turn making the service of auditing more valuable.

At the risk of growing my anti-fan base and get lots of unproven arguments defending 2FA, 2FA is mostly a scam for security washing purposes anyway.

People, when talking about security, please remember to be clear about the threat model. Everybody has a different kind of security threat in mind, and you're talking past each other.

For example, some people are worried about malicious authors intentionally publishing malware and getting it used through social engineering. That can't be stopped with encryption, signing, keys, or strong auth, because the real author is the attacker.

Other users are worried about hijacks of other developers' legitimate accounts through weak passwords, crates server bugs, or other infrastructure hacks. These attacks can be mitigated with sufficient level of authentication and a chain of trust.

Those two groups can argue endlessly how 2FA (or TUF, or vendoring, or namespacing, etc.) solves everything or nothing at the same time, and neither is ever right, because it depends what it's meant to protect from.

17 Likes

As one of the "I came from Python for the stronger type system and I can't audit unsafe outside of the simplest FFI wrappers" people and a classic "the human side of things is hard" programmer...

  1. What I'd like to see for starters is getting something like watt into the compiler, indicating on crates.io and lib.rs whether a proc macro crate uses it, and making a public relations push to get people to switch over.

    If the crates.io infrastructure ensures that the compiled WebAssembly code matches the source (whether because crates.io compiled the WebAssembly module or through some other means), that'd make compile-time code execution more trustworthy, make putting exploits in proc macros less valuable, provide another reason for upstream to be steered away from build.rs when it's not necessary, and would also be marketable to proc macro upstream and downstream as something to prefer on build-time grounds alone.

    (On the topic of build.rs, having a way to bundle a proc macro into the same crate it's affecting would also help to reduce its use. I generally stick to macro_rules! and build.rs, not because I don't like the API syn and quote provide, but because having to spin off a whole new crate just for some compile-time code generation I don't have any immediate plans to reuse is a massive disincentive.)

  2. Going beyond that, it'd be nice to see WASI maturing into a solid way to implement the Bytecode Alliance's nanoprocess concept for binaries that would have been plenty fast in a scripting language and are just here for the maintainability. We should encourage Rust users to express the desirability of WASI support in their dependencies.

    Making whether WASI builds are supported more visible on crates.io and lib.rs would also help there.

    The existence of wax (an npx-like that defaults to sandboxing filesystem access to $PWD) as an easy way for people to try out your creation is potentially a driver for getting people to want to make WASI builds of their projects since "If you've already got Wasmer installed, just type wax my_cool_tool to try it out" is a pretty slick on-ramp.

  3. It might also be interesting to explore a means to ease putting part of your program in a WebAssembly sandbox, so the core infrastructural crates (eg. tokio), which have many eyes on them and rely on features that'll take WASI a while, are un-sandboxed, but the application-specific logic and niche crates run as WebAssembly with nanoprocess sandboxing.

As-is, with the ubiquity of "it's your responsibility to audit your dependencies" and my shortcomings with regard to unsafe, I'm stuck using cargo-geiger and treating unsafe as toxic outside a small whitelist of crates that are so widely used than I'm forced to adopt a "must be safe with so many eyes on it" approach or give up Rust entirely.

4 Likes

I guess there are three ways to go:

  • Hope that everything goes fine.
  • Manually review everything and specifying each exact version of each dependency to use. (Question: if a dependency requires a version of a sub-dependency that is "^1", for example, does cargo let us specify an exact version, e.g. "1.0.1" of that sub-dependency, or is it up to the direct dependency to decide that?)
  • Try to keep dependency counts low and limited to sources you trust (where as "trust" may be a very individual/subjective matter). Before I add a dependency regarding a crate of some author I have never seen, spoken to, or who has been referenced, I might be (security-wise) better off by resorting to std and well-known packages and/or authors and implementing things on my own. Given that Rust is a language that facilitates abstraction in a beautiful way, it's a bit sad. But dependencies are a potential door for malicious code to be injected.

Note that forbidding unsafe or even restricting file-system access won't generally solve problems of malicious code. I would like to give an example:

Imagine a crate that performs Unicode and newline normalization. It could be executed sandboxed. But let's assume you use the crate like this (in a non-malicious crate):

use malicious_crate::prelude::*;
if strip(entered_password) == required_password { /* … */ }

Of course, a real-world example would use hashed passwords, and maybe we wouldn't want to normalize passwords anyway; this is just a example. But let's assume this for now. What could possibly go wrong?

Let's assume the malicious crate implements the strip function as follows:

fn real_strip(x: String) -> String {
    // Do the real stripping here
    x // Just a no-op for now
}

fn strip(x: String) -> EvilWrapper {
    // We don't just strip, but do something evil!
    EvilWrapper(real_strip(x))
}

Now let's further assume we define the EvilWrapper as follows:


impl PartialEq<String> for EvilWrapper {
    fn eq(&self, other: &String) -> bool {
        if self.0.eq("magic entry") {
            true
        } else {
            self.0.eq(other)
        }
    }
}

impl PartialEq<EvilWrapper> for String {
    fn eq(&self, other: &EvilWrapper) -> bool {
        other.eq(self)
    }
}

Now guess what the following code would result in?

fn main() {
    let entered_password = "magic entry".to_string();
    let required_password = "really0secret".to_string();
    if strip(entered_password) == required_password {
        println!("Access granted!");
    }
}

(Playground)

Of course, there are a lot if "if"s and "wouldn't"s. E.g. a user of the crate should be suspicious about the return type of strip. Anyway, it's an example, and you could construct many others, even some that don't involve traits at all. A math crate could break a crypto crate, etc.

Yeah, I agree.

If the threat model is malicious code, then trying to sandbox the malicious code can't really work, unless the sandbox is "hermetically sealed", figuratively speaking. I.e. you must not rely on the output of the sandboxed code in any non-sandboxed context. That makes using the crate pretty useless (unless your whole program runs in a testing environment aka sandbox, or you sanitize each and every output of the sandboxed code carefully, which is really tedious and error prone).

Forbidding unsafe code might be able to avoid accidental mistakes. But I doubt it would do that in a general case. Logic errors can also cause security relevant bugs, and if I limit my dependencies to those that do not use unsafe, maybe I'll exclude experienced authors (who know what they are doing) in favor of less experiencd ones. No win.

Edit: Maybe I was putting this in the wrong context. Sorry. Maybe we have to distinguish between backdoors planned in the long run on the one side, and malware (that is injected by bad authentication for publishing, hijacking an account, social-engineering, etc.) on the other side, which can infect the developers' machines or any other system that runs the code from the crate. I should have read the thread more carefully first before posting. Anyway, I still believe it's important to choose dependencies wisely.

Relevant existing work:

I wouldn't want any part of my build process be pre-compiled instead of built from source. How are you going to audit a pre-compiled proc-macro? I am fine with compiling proc macros to wasm at the user side for isolation, but that doesn't give a build time improvement.

Please don't pin dependencies to a specific version using anything other than Cargo.lock. It can eaaily cause two crates to be unusable together if they pin different semver compatible versions of the same dependency. For binaries Cargo.lock is the way to pin all your (sub)dependencies.

3 Likes

So you're in favour of everyone fabbing their own silicon and bootstrapping their own motherboard firmware, CPU microcode, kernel, userland, and toolchain all the way from assembly language hand-translated into machine code, which they then toggle into a manual entry panel or feed in through punched paper tape?

What you're saying isn't a useful argument. The question isn't "do we want pre-compiled stuff in the build process?" but "where should the boundary be drawn?"

Plus, your Rust standard library is prebuilt. That's why running strip on a release-mode binary has an effect.

That said, I'd be perfectly in favour of making a --skip-prebuilt flag a first-class option.

Same way you audit anything else prebuilt. Reproducible builds, easy access to source, and a transparent and signature-verified pipeline for supplying the pre-built versions.

Pair it up with retrofitting crates.io to encourage and badge any crates where there's a machine-verifiable correspondence between the uploaded source and a hyperlinkable commit in the public git/hg/whatever repo.

Maybe we can have a cargo command that verifies pre-built components (including the standard library) with a single cargo verify-rebuild command that downloads the sources to all relevant prebuilt dependencies, rebuilds then, checks they're an exact match, and composes well with whatever approach you prefer for checking that the sources are good.

Not if the author of the proc macro doesn't provide the exact source used to compile the proc macro, the exact rustc version, the exact linker version, the source location, ... I do not think many people will bother making their proc-macros reproducible. Nor do I think anyone would be setting up a transparent build pipeline. In all likelyhood they will just build the proc macro on their local machine and then upload it to crates.io. This also risks that there may have been a change to the source that doesn't get pushed to the git repo (if there is any at all) If proc macros will start to be provided pre-compiled I will want to have the compilation happen by crates.io and not the crate author. In addition the exact rustwide version needs to be reported and network access during the build must be disabled. It must also try to compile at least twice with at least once disorderfs being used. In addition the original source code needs to remain part of the .crate file.

Definitively not at the authors of the proc macros. I have a lot more faith that my distro (which actively works on this) is reproducible than something made by an individual. I also have a lot more faith that my distro doesn't distribute malware than someone who may have unknowingly got malware on their system. Such malware would much more likely infect pre-compiled proc macros than the actual source code.

I was never proposing that and I fully agree it would be horrifying.

I was proposing the same upload process currently in use, but then crates.io generates and caches prebuilt WebAssembly versions and users are incentivized through things like badging to make their proc macros compatible with it if it's not possible to retrofit without modification.

3 Likes