How (not) to use unsafe code

Are you suggesting that calls to FFI should not have justifying comments explaining why the call is in fact safe? I would think that would be good to have. For example, if you are calling an FFI function that takes 2 pointers to some memory and you are turning an immutable reference to a boxed array into those pointers before passing them to the FF, is that "Safe"? Does it result in UB? Shouldn't you justify why it is OK? For example, shouldn't you have a comment along the lines of, "OK to just pass a pointer to an immutable array because this FF does not attempt to deallocate the pointers or to write/modify the memory they point to. This FF's contract is that it will only read the contents of the memory the pointers point to." (perhaps simplified, but, something like that).

Wouldn't having some kind of documentation like this on every unsafe block be useful? NOTE: The documentation like that would be on the unsafe block inside the safe wrapper around the call to the unsafe FFI function.

1 Like

Yes, I agree, as mentioned above: How (not) to use unsafe code - #16 by gbutler69

1 Like

To play devil's advocate, should we also add #[tricky_but_safe_code(...)]? I think we'd all agree the answer is no - that code should have normal code comments explaining it. In a lot of ways, unsafe is just a particular class of tricky code. Yes, it also has an effect on Rust's core tenets but it's otherwise no different, in principle at least.

This is why, personally, I'm more interested in understanding why unsafe code was used in the first place, as mentioned. I believe there's more meat on that bone than the different ways to put code comments.

8 Likes

As always, it depends, and there are exceptions. But in lots of cases it'd feel like x++ // add 1 to x comment.

Often functions don't like null pointers, want strings 0-terminated and so on. These feel somewhat obvious and repeating the documentation.

And OTOH there's always a risk that I misunderstood the requirements, so my comment may state insufficient assumptions, misleading the code reviewer that it satisfies required criteria.

In other cases the unsafety isn't even anywhere near the unsafe block. For example init( &struct) kind of functions may have very complex safety requirements depending on content of the struct (e.g. error handler callbacks for libjpeg), so it'd be better to comment in place the struct is built.

5 Likes

You're correct here (IMHO). It is much more important to understand why the author felt so much use of "unsafe" was justified/necessary and try to address those concerns.

Yes, that makes sense and I definitely see your point more clearly now.

With this, though, to me the way I would use your comments would be to see if you thought about the issues at all reasonably. I could check your comments with the underlying FFI definition and if they didn't line up very well, I could know that I need to examine much more deeply to understand if you messed up or I'm just misunderstanding. But, if your comments line up with what the FFI function's contract actually seems to be (from documentation or actually looking at the FF code), then I could have a certain degree of confidence that you were thoughtful about the usage.

That being said, I definitely agree with @vitalyd that the important thing in this discussion is not how best to document/annotate unsafe usage, but, to understand why certain authors of Rust code are tending to over-use unsafe and how that can be prevented/mitigated. Solutions could include any of the following:

* Educational Materials
* Language Changes (Missing Language Features?)
* Standard Library Additions (Missing stdlib Features?) 

One thing that could usefully come out of this "Discovery" that Actix uses too much (?) unsafe inappropriately (?) would be to create a Nomicon chapter or appendix that explains why each of these is inappropriate and what the appropriate solution should be (along with patches/PR's for Actix correcting the issue).

3 Likes

Hmm...in a sense, we kind of have that. Lints attempt to warn against "Tricky" code in many cases. You can optionally disable lints (which is kind of the equivalent of #[tricky_but_safe_code(...)] if you turn your head the right way. So, adding an override for lint should maybe require a "reason" that the lint is being ignored?

In fact, as some have pointed out, the Actix author had to purposefully disable some lints for the code to compile/not warn on many of these unsafe usages and had the annotation that disables the lint required them to give a reason for disabling the lint, perhaps (and that's a BIG perhaps), that would've given enough additional pause that the author would've avoided so many uses of unsafe?

2 Likes

I think lints mostly (if not entirely) work on extremely local scope - i.e. this line or two are tricky. But those are not as interesting as tricky code (safe or unsafe) that requires some "remote" knowledge.

2 Likes

My impression was that most Rust programmers were quite concerned about unsafe being unsafe, and were sometimes even too cautious about using it. E.g. from time to time people ask how to make a doubly linked list and seem unsatisfied with an answer that it's best done with unsafe. Actix is the first case where I saw unsafe used with more courage to bypass the borrow checker.

How can we know how big is the problem? Are we blowing out of proportion just a few quick hacks by one author, or is this a systemic problem that can cause trouble in the Rust ecosystem?

5 Likes

Given the popularity of Actix, such a precedent by a single author might become a systemic problem in the future.

3 Likes

1,025 unsafe expressions out of about 37,000 doesn't seem like a few quick hacks. It more seems like he thought the best way to get performance (and perhaps convenience of implementation) was to use unsafe liberally without much in the way of justifying the usages. That seems like it might be a more systemic problem.

2 Likes

To me, this looks to be a case of "high trees catching lots of wind" (Dutch idiom, not sure of translation).
Actix has seen quite the meteoric rise to fame in the community (e.g. most of the RustFest webdev workshop revolved around actix in some form or another), and with such fame also comes increased scrutiny.

To suddenly discover that the "up-and-coming poster child" of Rusty Async Web is built on so much unsoundness seems to be a bit of an "the Emperor wears no clothes" moment for all of us.

The important part to remember about that fairy tale is that the blame doesn't rest on the tailors, it rests with everyone who "saw" the clothes.
That's why I appreciate that this thread focuses on learning from the experience, and improving how we, as a community, deal with unsafe foundations.

Improved tooling, better education and active participation in foundational crates have all been suggested, and seem like the way to go to me.

It is good that we discover this so soon in both Rust's and Actix's lifecycle. Just compare openssl heartbleed to see how dramatic it could have been if this had slumbered for a few more years...

10 Likes

Also, I think it is arguable that the fact that Rust has "unsafe" to isolate potentially unsafe code (WRT memory usage) is a huge factor in discovering this problem so early and is, in-fact, a testament to the fact that Rust's overall safety story is better than its predecessors.

9 Likes

That is a good thing to highlight. Sometimes safety problems don't always manifest directly at the site of the unsafe call (that is, a change in safe code can break assumptions that unsafe code was previously relying upon), but imagine how much harder it would have been to dig up the issues without the unsafe blocks pointing directly to obvious trouble spots.

3 Likes

Exactly. Just think of C or C++ where the entirety of the program is basically one giant unsafe block.

Given the community around Rust, actix's misuse of unsafe is still bad... but the project is still hopefully safer and more easily debugged and refactored in comparison to other languages.

I think this thread has also evolved into a discussion about auditing unsafe blocks and best practices - which is a great conversation to be having. I have no idea how to move forward, but I'm glad we're having it now instead of later.

4 Likes

PROPOSAL: How about everyone interested takes (claims) 1 or more unsafe usages from Actix and posts here how the unsafe usage is incorrect/results in UB and also takes a swag at proposing an alternative safe solution, or, in the case where the unsafe is justified and correct, posts what explanation would be appropriate to include with the code to explain the usage.

Then, when enough of these have been gathered, 2 things can happen:

  • Nomicon entries can be made regarding the particular class of inappropriate/appropriate "unsafe" usage (PR's made there)
  • PR's can be created/submitted to Actix (or the Actix maintainers can just use the comments here to do the work)

Also, if someone comes across an "unsafe" usage that could be safe code if either the language or standard library had some particular missing feature, a sub-thread could be spawned to discuss that particular feature (Pre-RFC etc).

11 Likes

That's a great idea! I would propose you create a separate thread, though, so replies aren't buried under unrelated stuff / for better visibility.

I wonder if most blocks can be fixed without bigger refactored..?

2 Likes

I like that idea as well. Only thing I’d amend is have a working PR against actix that actually shows the full change, either code and/or comments explaining away the unsafety. Otherwise, it’s all too easy to criticize, particularly code in isolation, without adding any real value in the process. The added bonus is actix will be better off and may gain some contributors in the process.

7 Likes

On that note, Actix devs are already removing unsafe code (see last 3 commits from commit log).

meta: it's very heartening to see so much discussion and action focused on making things better in a practical way, consistently in so many threads

14 Likes

It'll be interesting to know if they found any bug at compile time after they changed all those lines from unsafe to safe.