Pure Rust and Safe badges

Usually by cheating. Think of all the scientific papers that publish fragmented results with hundreds of co-authors to raise their h-index.

The same goes for safe/unsafe code: if you start judging the code on that criterion, people will move the parts that need unsafe in separate crates, but the unsafety can escape:

fn cheat_deref(p : *mut u32) -> &'static mut u32 {
    return unsafe { &mut *p };
}

Furthermore safe code is not safe: Rust does not need unsafe blocks to invoke a shell with unescaped strings from untrusted sources (does Rust have anything like Perl's “taint” system?), or to overwrite precious files with nonsensical data.

A peer-review system would IMHO be much more useful than automatic flags.

1 Like

I wasn't going into the "what is safety" discussion here, so let's stay on track. But I have to admit that lines of code is too blunt a sword to slice the unsafe beast.

So what other options do we have? We want a metric that can be automatically calculated, that reflects our sensibilities about unsafe code and that cannot easily be cheated. Perhaps count the code paths through the module, and report all code paths as well as all code paths that were completely safe.

To keep the calculation fast enough, we might want to stop at the crate boundary, so it is still possible to cheat. Perhaps a crate annotation that we could use to count all calls into that crate as unsafe would alleviate this issue, if correctly used.

Unsafe is a lot about trust. The programmer says unsafe but they mean trust me, rustc [I know this is safe]. If you depend on a package, you need to decide if you in turn trust the author or their proof methods or testing methods.

Further, we have a coding style where we don't encourage minimizing unsafe blocks. There is no point in bracketing just single function calls or dereferences, if the whole algorithm in a function is critical for the code to be correct. I personally call this kind of code unsafe-critical.

The usual example is any code that modifies the struct fields of the Vec in its implementation. That code is not unsafe but it is unsafe-critical. That's code you are trusting. Static analysis could help uncover exactly what amount of code that would be in each crate.

unsafe demarcates regions of code that warrant special
care. Clearly for safety reasons it is best to have as little lines of
unsafe code as possible within a module, yet you count a module that is
written without a line of safe code the same as a module that has one
unsafe line.

Actually, no. Or not sufficiently.

An unsafe block taints the module because it is based on assumptions (invariants) that can be violated by safe code. As a result, not only should unsafe code be audited, but also any line in the module that is somehow related to this code.

As a result:

  • it is a good idea to export an unsafe bits into as small modules as possible, as this is the only way to reduce the amount of code to audit
  • and therefore counting a whole module as tainted if ever it contains unsafe code is realistic, as it really represents the amount of code to audit

I do not agree. The taint does not stop at module boundary, it stops where the API makes it stop, and manual audit must go until automatic audit can take over. For some code, that is just the unsafe block itself, but code where the taint escape all the way out is possible too.

The book and various other source recomment to stick to the first case: do not let “unsafe” API escape from unsafe blocks. But nothing will tell us if some program adheres correctly to that recommendation until the audit has been done.

If the audit has not been done, the unsafe taint must be assumed to go all the way down to the final binary.

This is the reason I believe this kind of automated badge is essentially useless. IMHO, the only kind of badge that would matter is “audited by Insert Name Here”.

Perhaps having a tool that finds all instances of unsafe blocks inside a crate (and optionally its dependencies) would be useful1? It allows people to check which crates have unsafe code in them - and how much - without being as blunt as a badge. Encouraging people who care to take both the content and context of the unsafe blocks into account sounds like a good halfway house between quickly seeing if a crate is "safe" 2 and having to do a full audit of the code.


1 unless there is one?
2 for given definitions thereof

Having a badge that indicates a library is pure Rust and doesn't depend on external code (aside from system libraries), would be fantastic on Windows where getting external libraries built and available can sometimes be a serious hassle.

5 Likes

I agree that usually it's safe code that has to uphold the invariants of unsafe code.

I do not buy the "unsafe taints the whole module" business, though. Does the mere existence of unsafe code somewhere in a module make all calls into that module unsafe, even if the unsafe code in question is not even invoked?

(But I do agree that it doesn't matter if one or many unsafe code paths were invoked. One invoked code path suffices to taint the caller)

I agree with you, anyway it could be useful also for other OS. It could be a quick way to inform users of a crate if the build process only relies on Rust (so cargo) or if they could need to pre-install some external libraries, it is also a hint about safety: if a crate uses an external libraries it could be unsafe.

I think it would be better if the authors are asked this question before publishing a crate, so there is some sort of code-review.

This is useful I agree, but

it would be better if there was a system to track external dependencies and display them somewhere instead of having a badge for that.

To my mind, a Safe/Unsafe badge ought to be accompanied by some sort of Audited badge (which would connect with who claims to have done the auditing). The point of this distinction ought to be to encourage unsafe only where it is both auditable and audited.

We are getting closer and closer to a web of trust with every post in this thread :smiley:

2 Likes

I do not buy the "unsafe taints the whole module" business, though. Does the mere existence of unsafe code somewhere in a module make all calls into that module unsafe, even if the unsafe code in question is not even invoked?

Well, if the unsafe code is not invoked by the final binary, even through indirect paths then there is no issue indeed; but that is in general impossible to assess when looking at a library. And if a function containing unsafe code cannot be reached via the public API of the library, then it's unused (which is another lint), so it's safe to assume that all functions containing unsafe code of a library can be invoked.

As for breaking invariants: imagine a unsafe function assuming that the length of the buffer is even and using unchecked indexing... a safe function can easily change the buffer to an odd length and trigger an issue for the next call to the unsafe function without ever invoking it (even indirectly).

If only there were some sort of static check that could recursively taint variables and fields as "unsafe", with some provision to let users say "not actually unsafe", to help people writing unsafe code make sure they don't forget about implicit invariants.

Perhaps implemented as some kind of... plugin to the compiler. I dunno; sounds like a pipe dream.

Now where did I put those paperclips...

1 Like

This is already how it works with the recommended use of unsafe blocks: the code inside it is allowed to do unsafe things but is supposed to leave the result in a safe state.

On the other hand, this kind of static check for various other properties than safe/unsafe would be awesome, for example trusted/untrusted. Imagine if the compiler could tell you that you are injecting a string from the network directly into a shell command.

That doesn't work when it depends on, say, a len field of a struct to not be tampered with, but literally every method that takes &mut self can potentially change it. I mean, self.len = !self.len isn't unsafe as far as the language is concerned; it doesn't require the unsafe keyword by OH MY just you try doing that in a container implementation and see what happens. :smiley:

unsafe is as viral and pernicious as pop music, though obviously not as dangerous.

1 Like

Well, problem is clippy doesn't have that kind of information on external crates, though it may be possible to extend rustc to emit a 'includes unsafe code' flag for rlibs. Using that would enable to get at least a coarse idea if unsafe code may actually be invoked.

I like the idea about some indicator about external libraries. This is useful to figure out if something might not be buildable on your machine.

1 Like

I just put together a cargo subcommand (cargo-count) which has a flag --unsafe-statistics for counting lines of "unsafe" code. Granted this is super naive :stuck_out_tongue:

And it doesn't traverse into dependencies...but it wouldn't be too difficult to script something to do so, and know instantly if there are any unsafe lines (or rather how many and what percentage).

I like the idea about some indicator about external libraries. This is useful to figure out if something might not be buildable on your machine.

Exactly... This is the basic idea that I have in my mind. I would love to have a quick indication (via cargo option) that let me know if a crate is buildable only relying on rust/cargo or if I need to install some external libraries.
As an extra it would be also nice having an information about safety, like how many unsafe LOC are used in the source code (as proposed and implemented by @kbknapp