Static analysis of "permissions" needed by crate/mod?

As an experimental exploratory project, I wanted to try out building an analyser for Rust that would look through modules/crates and try to figure out what "permissions" each one would need. "Permissions" in the same sense as in Android/iOS apps: disk access, network access, etc.

This is motivated by cases of projects (though mostly in the Node.js and npm world) where a dependency turned out to do something unexpected.

I'm aware that this is likely going to be very hard to get right. As far as I can tell, I'll have to ignore unsafe blocks and just mark any affected code as overall "unsafe". But I'm hoping that for the general case of popular crates out there, it should be possible to run an analyzer and get certain basic reassurances. For example - this crate that's supposed to format input text in some way doesn't need any disk access, or this crate that compresses files on disk doesn't make any network calls.

Does anyone know if someone's already built such a thing out there for Rust? Or can point me to similar analysers for other languages? Or just help me see if I'm missing something obvious here?

Thanks!

Given that things like disk and network access ultimately have to be done via unsafe, I'd say you absolutely cannot ignore unsafe. If you make unsafe a transitive property, all Rust code in existence that does anything more than pure computation is unsafe. At that point, you need to analyse what unsafe code is capable of, which might be undecidable.

Even if you assume all unsafe code is benign, not trying to do anything sneaky, and is free of bugs (ha!), you'd need to know what every OS and external library function does, including the ones that use stuff like dynamic dispatch.

So, really, what you're proposing is first going through and building a permissions model of every native and third-party library used anywhere in Rust, and then propagating that information back through potentially arbitrarily complex unsafe code.

Well, I suppose you could avoid having to model third party libraries if you also made the analyser work on C first. Then you'd just need to do every OS call. Oh, don't forget that some OSes use things like the filesystem to distinguish between different kinds of behaviour, so you'd also need to model the filesystem and how that affects the calls.

Or you could ignore all that and assume the crate only uses std APIs. Of course, that would be unable to detect nefarious cases, meaning it wouldn't actually give users any reliable assurances at all, thus defeating the stated purpose.

Alright, what if we assume std is special and has no bugs or unwanted behaviour whatsoever (ha!). Any crate using unsafe itself, or depending on a crate that uses unsafe is just flagged as dangerous. We only care about safe uses of std APIs, nothing else.

That could probably work. I don't know what proportion of the crate ecosystem it would work on, but I do know that would exclude regex, serde, and anything that transitively uses them, as they're heavily optimised and use unsafe. So probably a lot?

Good luck.

1 Like

You'll have to decide what to do with conditional compilation, as crates have different behaviours depending on platform.

Traits and closures mix code across crates. A crate that takes dyn Read can be given a Vec (no I/O) or a File (I/O). I'd be worried about cases where multiple crates and layers of indirection can be combined to get behaviour that's not obvious from analysis of each crate individually.

HashMap::new() reads from disk on Unix platforms — /dev/urandom for the hasher. You'll have to figure out "who" to attribute that disk access to, as otherwise pretty much every crate needs disk access.

With simple analysis you should be able to find crates that may be accessing disk or network (if you find calls to known functions/crates anywhere in them), but it seems incredibly hard to prove they don't (since that is not part of Rust's threat model and there are many sneaky workarounds).

1 Like

Thanks, both of you! That was super useful.

While you painted a bleak picture, that's exactly the dose of realism I needed. All of those are great gotchas and definitely things to keep in mind.

Here's how I'm thinking about some of those cases:

I will probably ignore all bugs in std crates and probably won't even analyze std crates at all. Based on their descriptions, I will hardcode/assume the "permissions" they need. This will likely be wrong, such as in the delightful case of Hashmap::new reading from /dev/urandom, but I think it's fair for the kind of results I'm looking for.

The dyn Read example and cross-crate unexpected behaviour is a very tricky one indeed - thanks for that case.

unsafe code is clearly a big worry. I think I'm underestimating how common unsafe code is in widely used crates. I don't see a way to analyze unsafe code, though, so this is a practical limitation (that might render the whole project useless, of course).

I think the interesting part of the project is to analyze the control flow between functions/crates. If I can get that somewhat right, then I think there's some pragmatic ways to avoid some other issues you've mentioned. For example, something along the lines of "trusted" crates. Say I'm analyzing my project foobar and I have 100 dependencies. I can say I trust the std libraries and also some other popular libraries who I trust are being maintained properly like regex and serde. The more crates I "trust", the smaller the number of crates that need to be checked are and the simpler the analysis is - but also the less useful the analysis is.

At the end of the day, yes, this is clearly not going to work properly unless it's part of Rust's safety model and it's not right now. But maybe some day it will - I think it would be really nice.

But the Rust compiler and community are both quite nice, which makes this project worth trying from my perspective. The MIR seems like a nice representation to do some of this analysis and I'm sure the lovely Rust community will help out if I do end up doing this. :slight_smile: