Aren't there any efforts to bring Rust's dependency number down?

Not sure what dependencies you mean, but surely std lets you access files, environment variables, and networking without any other dependencies.

std::fs::File
std::env
std::net

6 Likes

Good point. I guess my scheme would require a means of drawing a circle around all such dependencies in std that have any connection to the outside world. I have no idea if that is feasible.

I don't know why you would think that. As indicated above, file I/O is kind of trivial using only stdlib. Now how that's implemented in stdlib, I don't know.

But unless you're willing to block any unsafe code in stdlib (which also means you likely won't have access to heap-allocating types like Box and Vec), you'll have unsafe code somewhere.

Also note that auditing direct dependencies is not enough if one wishes to do an audit. The entire dependency tree (including all transitive dependencies) needs to be audited. The work required for that scales superlinearly with the number of imports.

1 Like

Clearly I had neglected thinking about std in my scheme.

But all is not lost. If your suspicious_function() makes any calls to anything outside of itself (or in general your suspicious crate does so) it must be possible to follow that call tree down and discover if it ever hits anything that does any kind of I/O.

Of course we expect many crates do read files and such, but being able to check the hundreds of dependencies they have which don't already whittles down the work load.

I'm prepared to trust in std. It's normally the foundation of everything. As such I prepared to accept any unsafe code in there. Std is normally the foundation of everything and it comes with the same provenance as the compiler itself. If we can't trust in that all is lost.

There's also that thing where just because you don't have any unsafe blocks doesn't mean you can't do unsound things. For example, you can use file IO to transmute things.

1 Like

As I noted above I expect to follow things further than the direct dependencies. All the way down to root out anything that does I/O.

Does it? That is not so clear to me.

The point of my scheme is not that I audit all of that. The point is to have a tool that scans all of that and reports "interesting" things for me to take a look at. Like use of "unsafe" or any I/O. Which reduced the audit work greatly.

If that turns out to be the case, let me hereby name it the Hoare constant.

[/off]

2 Likes

Cool idea.

I find it a curious coincidence that two of the few compilers with a focus on correctness, useful error messages and not falling over at random, ALGOL and Rust, were largely created/inspired by guys with the surname "Hoare". Is there some kind of lineage here?

Except Tony Hoare succumbed to pressure for performance and allowed NULL references. Which he describes a his "Billion Dollar Mistake": https://www.youtube.com/watch?v=YYkOWzrO3xg

Wow, I did not realize until now that Tony Hoare was the null guy.

1 Like

Oh yes.

I'm pretty sure Tony did not invent the null idea. I gather from his presentation that he was dead set against allowing it in ALGOL, fully aware of the havoc it would wreak. But it was demanded in order to allow ALGOL to compete with FORTRAN in performance.

I presume from that that a lot of ALGOL's correctness checking was done at run time.

Think of it this way: for the amount of work to scale linearly, each added dependency must add 0 "new" dependencies to the set S of crates (directly or transitively) used by the root crate, so that only the work done within the added dependency has to be audited.

So how many nontrivial crates would fit that criterion? In practice I'm guessing not many.

Of course, when a dependency D has dependencies of its own that weren't already in set S, now you have to audit at least 2 crates (D + at least one direct dependency), plus whatever other new crates are added to S (i.e. the transitive dependencies of D).
And that's a recursive process, so... :boom:

It's perfectly possible. For example, most stdlib functions can be called without using unsafe yourself.

You do need unsafe to call C/C++ APIs.

3 Likes

OK. Clearly some non-linear growth going on. Every external function called and in turn call many other functions, which in turn ....

But I believe it's manageable. There are only a finite number of functions in an entire built program. Hundreds, thousands, whatever. It's a tree that can be scanned in reasonable time.

Must be so, the compiler already does that when it builds ones program.

Indeed. FFI is another escape hatch. If something pulls in a million lines of C or whatever we have a problem...

Keep in mind that auditing is a manual process. Can't farm it out to a machine, at least not if the results are to be trusted. What we can do is have the tools perform some work, but even then those results need to be scanned manually for false positives and false negatives.

Even if we created a neural net for code auditing (which I'm not sure can be done with the state of the art of AI in 2021) we'd have to check its results in order to have any faith in the result.

Yes. That is the whole point of my suggestion. A tool can search for "interesting" code in that untrusted crate. Code that includes, or transitively includes, unsafe blocks or I/O. A human can then focus auditing time on those features.

The point being that very many crates have no reason to include those things so if they do it is something suspicious to look at. If they don't then Rust's memory protections don't allow rogue code to reach out and do harm.

Certainly if a new version of a previously inspected crate suddenly sprouted such interesting features that would trigger alarm bells.

Of course the result of all this falls far short of a proper audit. But the idea is to direct what effort is available to likely points of failure.

An example of malware in a supply chain attack:

You have some "harmless" application, maybe you even wrote it.
It depends on some crates for middleware.
Both your application and the middleware crates depend on a variety of crates that do various useful low level things, some with unsafe usages, some without.
Finally, you have the std lib depended on by pretty much everything.

You've audited for unsafe usages. and found nothing unusual.

Except unknown to you one of those middleware crates is stealing cpu cycles to mine crypto currency, or perform DDOS attacks, or exfiltrate data from your system, or acting as a command and control relay, or subtly sabotaging your encryption. Right alongside and possibly inside legitimate I/O usage and all without unsafe.

Conclusion:
Unsafe is there to protect you from memory corruption. It is not really a security barrier. It can help, but certainly isn't a panacea. If you only look at unsafe and crates that directly do I/O, you will miss the forest for the trees and have a dangerously false sense of security.

2 Likes

Exactly. Hence my suggestion above: Aren't there any efforts to bring Rust's dependency number down? - #63 by ZiCog and the discussion that followed.

I don't believe anyone was claiming that unsafe is a security barrier. Rust programmers understand that it's not even a memory safety barrier. (Code that uses unsafe need not have its eventual memory management errors manifest lexically within an unsafe block.)

It's a tool of abstraction meant for human readers to know which – hopefully narrow – parts of the code to focus on first when debugging a memory error. It is also a way to make sure the type system is sound, and one can't possibly write memory management errors without having to type unsafe somewhere.

Suggesting that auditing unsafe is missing the forest for the trees therefore sounds highly dishonest to me, or at least a strawman argument. The same goes for suggesting that people assumed unsafe is the only place to audit.

2 Likes

I don't think anyone has talked about / proposed a potentially simpler solution yet (in this thread):

Imagine an extenstion to cargo called cargo-sandboxed where every dependency is scanned for accesses to fs, network, etc. Then every dependency you bring in would have to be tagged with allow_net, allow_fs etc. in Cargo.toml.

I guess it would be hard to verify at the source code level (asm support and whatnot) but maybe relevant syscalls could be found in the compiled code.

It would have cascading breakage if a deeply nested crate needed new permissions, but that's kind of the point.