In my opinion would be great to have in crates.io something like a badge that informs you if a crate is fully pure Rust, that means that its code doesn't use external libraries (so it doesn't need to be linked to something that is not written in Rust). Maybe also cargo could show this information while compiling.
A similar idea could be implemented also for safe/unsafe crates.
I don't really care much about Purity. I do, however, think safety guarantees (i.e. "this crate does not use any escape hatches") are a fantastic idea. And I think they belong at the compiler directive level: #![forbid(unsafe)]. In the past I've imagined similar pragmas for guaranteeing no panics (although unreachable! is still allowed, since that should only be used in match arms).
I think a Safe / Unsafe badge could be a nice idea.
It's not 100% meaningful, will dependencies be taken into account? Dependencies like libc and libstd would not earn a "Only Safe Code" badge, so who decides which crates are exempted?
If you don't take dependencies into account, that could encourage splitting out the few unsafe blocks you have to their own crates. That sounds kind of like a sensible goal I think.
I think it should be two badges, you have either one or the other: “Only Safe Code” and “Some Unsafe Code”. It shouldn't be that either you have the badge or you don't, it should be a categorization. This to encourage more objective evaluation.
Oh and remember that at any time there will usually be a couple of memory safety bugs in flight in unstable-marked features. Which brings us to the next badges! “Stable” and “Nightly”.
I'd be careful of stigmatizing unsafe. The community currently avoids it, which is good, but if it gets stigmatized too much it might be avoided in situations which need it (i.e. making libraries). A "Safe Rust" badge doesn't really say much about a crate, except perhaps that the crate isn't an FFI or abstraction crate. It eventually is unsafe code at some point anyway, so I don't see how this will help track down unsafety bugs.
There's an argument to be had that the standard library (what I assume you're referring to) is in some sense part of the language, and bugs in it are effectively language bugs. You can treat its unsafety as a black box.
Maybe you're right, the safe badge could generate a kind of stigmatization against unsafe code.
Anyway my first idea of pure badge, that means completely written in Rust (no FFI, no external dependencies, safe or unsafe code), was actually to promote projects that rely on only Rust code and to make more clear when a crate cannot be compiled/executed exclusively using cargo.
Perhaps a service that counts LOC and ULOC and provides statistics on how much unsafe code there is would be more relevant – at least for auditing reasons.
I believe this "semantics" to be mistaken for two reasons:
unsafe demarcates regions of code that warrant special care. Clearly for safety reasons it is best to have as little lines of unsafe code as possible within a module, yet you count a module that is written without a line of safe code the same as a module that has one unsafe line.
In that latter example, one could reasonably move the unsafe line to another crate, and import it from there. In this case that other crate would be unsafe, but either the remaining lines of code would now magically be safe, or there is no longer a distinction, since as we know all code uses something unsafe or other if you dig deep enough.
Again looking from a purely utilitarian perspective, it makes sense to minimize the use of unsafe code. And what gets measured, gets optimized. So let's measure.
Usually by cheating. Think of all the scientific papers that publish fragmented results with hundreds of co-authors to raise their h-index.
The same goes for safe/unsafe code: if you start judging the code on that criterion, people will move the parts that need unsafe in separate crates, but the unsafety can escape:
Furthermore safe code is not safe: Rust does not need unsafe blocks to invoke a shell with unescaped strings from untrusted sources (does Rust have anything like Perl's “taint” system?), or to overwrite precious files with nonsensical data.
A peer-review system would IMHO be much more useful than automatic flags.
I wasn't going into the "what is safety" discussion here, so let's stay on track. But I have to admit that lines of code is too blunt a sword to slice the unsafe beast.
So what other options do we have? We want a metric that can be automatically calculated, that reflects our sensibilities about unsafe code and that cannot easily be cheated. Perhaps count the code paths through the module, and report all code paths as well as all code paths that were completely safe.
To keep the calculation fast enough, we might want to stop at the crate boundary, so it is still possible to cheat. Perhaps a crate annotation that we could use to count all calls into that crate as unsafe would alleviate this issue, if correctly used.
Unsafe is a lot about trust. The programmer says unsafe but they mean trust me, rustc [I know this is safe]. If you depend on a package, you need to decide if you in turn trust the author or their proof methods or testing methods.
Further, we have a coding style where we don't encourage minimizing unsafe blocks. There is no point in bracketing just single function calls or dereferences, if the whole algorithm in a function is critical for the code to be correct. I personally call this kind of code unsafe-critical.
The usual example is any code that modifies the struct fields of the Vec in its implementation. That code is not unsafe but it is unsafe-critical. That's code you are trusting. Static analysis could help uncover exactly what amount of code that would be in each crate.
unsafe demarcates regions of code that warrant special
care. Clearly for safety reasons it is best to have as little lines of
unsafe code as possible within a module, yet you count a module that is
written without a line of safe code the same as a module that has one
unsafe line.
Actually, no. Or not sufficiently.
An unsafe block taints the module because it is based on assumptions (invariants) that can be violated by safe code. As a result, not only should unsafe code be audited, but also any line in the module that is somehow related to this code.
As a result:
it is a good idea to export an unsafe bits into as small modules as possible, as this is the only way to reduce the amount of code to audit
and therefore counting a whole module as tainted if ever it contains unsafe code is realistic, as it really represents the amount of code to audit
I do not agree. The taint does not stop at module boundary, it stops where the API makes it stop, and manual audit must go until automatic audit can take over. For some code, that is just the unsafe block itself, but code where the taint escape all the way out is possible too.
The book and various other source recomment to stick to the first case: do not let “unsafe” API escape from unsafe blocks. But nothing will tell us if some program adheres correctly to that recommendation until the audit has been done.
If the audit has not been done, the unsafe taint must be assumed to go all the way down to the final binary.
This is the reason I believe this kind of automated badge is essentially useless. IMHO, the only kind of badge that would matter is “audited by Insert Name Here”.
Perhaps having a tool that finds all instances of unsafe blocks inside a crate (and optionally its dependencies) would be useful1? It allows people to check which crates have unsafe code in them - and how much - without being as blunt as a badge. Encouraging people who care to take both the content and context of the unsafe blocks into account sounds like a good halfway house between quickly seeing if a crate is "safe" 2 and having to do a full audit of the code.
1 unless there is one? 2 for given definitions thereof
Having a badge that indicates a library is pure Rust and doesn't depend on external code (aside from system libraries), would be fantastic on Windows where getting external libraries built and available can sometimes be a serious hassle.
I agree that usually it's safe code that has to uphold the invariants of unsafe code.
I do not buy the "unsafe taints the whole module" business, though. Does the mere existence of unsafe code somewhere in a module make all calls into that module unsafe, even if the unsafe code in question is not even invoked?
(But I do agree that it doesn't matter if one or many unsafe code paths were invoked. One invoked code path suffices to taint the caller)