I was referring specifically to the following post that I believe overestimates the power of simply auditing unsafe and I/O usages. I'm not suggesting auditing those things is useless, just maybe not as all encompassing as some might think, since malware can build upon entirely legitimate usages of those constructs:
This would work if Rust packages were self-contained dynamic chunks of executable code. But they are not; they are interdependent and the compiler freely compiles macros, generic templates, cross-crate function calls, and inlining. And multiples of these can be applied. So it isn't so simple to track which crate a system call originates from in the compiled code, assuming it is even possible to define a single origin.
A language that's designed from the ground up to be sandboxed (e.g. WASM)
Doing sandboxing by scanning arbitrary code does not work. It's too easily defeated. High level languages like Java and Python have tried and failed abysmally. It's a non-starter in languages like C because there are too many trap doors to escape detection.
I think Rust has some interesting differences, namely memory safety that make it a bit more plausible!
Imagine if instead of std including system APIs there was std_fs etc... crates. Then every crate that uses unsafe, including std_fs, must be explicitly listed by at least the leaf (binary) crate, and which crates can use them.
I don't know how bad this could get in realistic situations (probably really bad with current crates!), but AFAIK this should let you audit only the mentioned crates for potential issues that sibling crates could abuse, since these other languages can't just abuse memory unsafety or dynamic "eval" type magic to escape like in the languages mentioned, they have to go through provided interfaces. It would definitely mean these unsafe crates need to have a much higher bar for defending against evil callers, (e.g. providing semantically invalid Iterators to provoke a bug), but I think it's at least feasible.
As has been mentioned several times already, memory safety != security. Operating systems provide a myriad of "safe" mechanisms you can use to break out of any sandbox you try to introduce at the language level, for example:
You can't statically block all code which opens /proc/self/mem because that string could be calculated/determined at runtime, so you need to make touching the filesystem unsafe/unusable. Which would force us to add exceptions for half the crates on crates.io and dilute the effectiveness of your [unsafe] mechanism
I may not even need to write any unsafe code at all - all I need to do is make sure my program dynamically links to some malicious library that triggers nastiness when it is loaded into memory (inside DllMain() for Windows or a function in the .ctor section on Unix)
None of these attacks can be prevented by scanning source code or through the static means you are proposing.
The way I read the OP was essentially that you currently can't access the OS/FS without eventually using unsafe (even if the unsafe is deep in std). As a consequence if your code and deps don't use unsafe and they don't include std then it shouldn't be possible to write an exploit in this environment.
Splitting std into smaller crates (like collections, strings, os, fs) and having to specify which ones are allowed in your Cargo.toml could give you some control and assurances about the behaviour of your deps.
I agree with that although more often than not, exploits use memory unsafety to gain control so preventing the use of unsafe can give you some advantage. It isn't a silver bullet but at the same time I wouldn't dismiss it only because some exploits don't depend on memory safety. But yes, you can't use it as a compile-time sandbox.
I like this approach. It means that the malicious library has to be installed in your system already which wouldn't be difficult to avoid. For example you could build executables inside a container which doesn't have network access. If you control the image you can audit its libraries.
There was an interesting experiment done on using code reviews to find vulnerabilities described in chapter 8 of Ka-Ping Yee's thesis. Ostensibly about a voting system, the most interesting part of the thesis was the inability of reviewers to find vulnerabilities in 100 lines of a simplified subset of Python.
I concluded from that experiment that code reviews to find vulnerabilities can help, but relying on them is not enough. After all, most of the time you're reviewing a lot more than 100 lines written in a language far more complex than Python. However, the limited amount of code in unsafe blocks might just make the problem tractable for Rust.
You can hardly do anything without unsafe. You can't even allocate memory. Which leads us to what should be an obvious observation: you depend on the code you require. If one of your dependencies is allocating memory and/or accessing files, it's probably because that's what it is meant to do, and thus the whole reason you're depending on it in the first place. You can't just drop dependencies because they do things; that's the whole point of using them.
it means that the malicious library has to be installed in your system already which wouldn't be difficult to avoid.
I mean, you're currently installing software that requires that library as a dependency. Why are you trying to run it if you don't want it to run? You'll install the dependency because it's a necessary step on the way to doing what you set out to do.
I mean, if you are really doing this then obviously linking or dynamically loading an external opaque library should considered unsafe. FS access is incredibly unsafe in lots of ways (eg, you can often write dlls into the dll load path ahead of system dlls in Windows), so a real from scratch system would need to have finer gained fs permission.
And yes, you eventually need unsafe to do anything including allocating. The point is to restrict what can directly use unsafe such that you can more realistically check only those crates are correctly enforcing security.
As it happens, I don't think unsafe is the correct mechanism for this, if only because what security even means is not always clear or depends on the domain, but it seems to get surprisingly closer than you would expect, and far closer than more traditional languages have so far. (I expect there's probably research along these lines already)
How much of the code on crates.io actually does things that could be used to make malicious code? Almost any crate that just implements an algorithm should be able to do so without any IO or unsafe, which completely eliminates the possibility of it containing malware (maybe I'm wrong?). This is something that the compiler could easily look for, and that would reduce the load of code auditing to only a few functions because the rest don't do things that can be malicious. I know there are situations where you need to see how the questionable code gets called in order to determine if it's safe, but this seems like a step in the right direction.
I think it also aligns quite well with the other ways rust tries to make code safe, such as declaring methods with &mut self or &self based on whether they require mutation. The result of that annotation, aside from that it's needed for the other things the compiler does to keep you safe, is that you know when you're calling a method whether it might mutate the object without needing to look at the implementation. A similar thing could be done to check whether a function is safe, by looking down the tree of what else it calls to see if any of those functions do any IO or have any unsafe blocks.
It is easy to make a counterexample that bamboozles this kind of static analysis, but that kind of thing would often be a red flag by itself. I don't have a lot of experience in this area so I could be completely off base, but it seems like this would be a good idea.
It's been said before (I think even in this thread) but it bears repeating: you do not need unsafe to write malicious code.
Even if there was no unsafe in the language, it would still be possible to write malware with Rust. In fact, any Turing Complete programming language can be used to write malware.
Unsafe and maliciousness of the code are almost completely decoupled. The only link is that unsafe can make it easier to write malicious code because of the potential for memory unsafety. But it doesn't make it possible.
...so no, this wouldn't work at all.
Indeed it is easy. And while it would in a sense be a red flag in that it's suspect code, it wouldn't necessarily use unsafe or anything like that, and so it would/could easily enough slip through your proposed "scanning for unsafe blocks" mechanism.
I think that's exactly the point -- if a dep doesn't use unsafe then I could assume that it can't be malware. The dep can still be deliberately broken and e.g. if it's a pure algorithm used in voting it could maliciously bias votes, shut down power plants, etc.
Being able to white list part of std (e.g. collections) would allow dependencies to use Vec, String, etc but not access FS. In addition, if I could provide my own std FS functions (like I can already do with the allocator), I'd have a better runtime control of what deps do. Alternatively I can imagine libraries providing audit hooks which I could use to control access at runtime (and in this case my project would only allow deps that provide those hooks.)
I'm sure there are other approaches to explore -- I wish that we as a community could explore what is possible rather than saying that something is impossible because it isn't perfect. I don't really care much about perfect, I care about computers helping me with my day to day tasks.
I meant within the crate itself, by indeed using e.g. stdlib.
Only because if you categorically exclude all unsafe, it becomes impossible to use even stdlib. And hence, without side effects, if the harm is reduced at all, it's only because such a program cannot do anything useful in practice, because it can't interact with the world outside of its process: no input, no output, not even heap storage is available.
This leads me to conclude that when it comes to real-world useful software (i.e. most if not all crates), unsafe just misses the mark as a mechanism for scanning malware.
I think these conversations would go smoother if they didn't almost always start out with (and stick to!) "let's repurpose unsafe for this idea". Being able to defensively restrict what dependencies can do is a pretty interesting idea to me. But I agree that using unsafe for this is entirely the wrong mechanism.
I spent some time trying to write out why for the various ideas suggested, but really, others have already covered them and I don't think it would be constructive. Instead I'll just suggest that if you're interested in discussing the concepts like dependency restriction, don't be so bullish on a specific method like co-opting unsafe. Part of the exploration of an idea is working out why one approach or another may not be tenable.
Another example of "you're always going to have to audit the code" though, no matter the method: I've used a number of crates that primarily provide macros to reduce boilerplate in a derive-like style; probably we all have. Those macros could, with no FS use, I/O use, allocator use, etc., put whatever code they want to into my owned code base.
I suspect macros are legitimately used too widely to be something you would restrict crates on, and too complicated (ala Turing) to be programmatically whitelisted in a secure manner.
It sounds like unsafe is used a lot more in the standard library than I had considered. What about if the boundary was set such that std's usage of unsafe is trusted, and things outside std that either use unsafe or call an IO function from std are suspicious? I'm pretty sure that would make any kind of filesystem, environment, or network access get caught, while still allowing heap allocations and harmless utilities like std::mem::swap. If a crate's functions return the correct results (verified by unit/integration tests), is it possible for it to contain malicious code if it doesn't call any standard library IO functions, and doesn't have any unsafe blocks?