Add #[must_comment] attribute for `*_unchecked` functions

Some crates (even std) provide APIs with *_unchecked() functions. Those usually omit validity checks for the sake of performance. However, those functions can corrupt the data model and cause errors later if not used correctly.

Wouldn't it be nice to have a #[must_comment] attribute for such functions? The attribute should work similar to #[must_use]: The compiler throws a warning (or error) if a tagged function is called (outside the defining crate?) without providing a comment (//). This way a crate could enforce to comment on the use of such potentially dangerous (not necessarily unsafe :wink: ) functions.

I don't think I've ever seen an *_unchecked function that wasn't unsafe.

You could always add a new lint to clippy which looks for all usages of *_unchecked() functions and validates that it has a comment.

That way instead of giving the function an explicit #[must_comment] attribute, you give developers the choice to opt in/out of the lint using normal lint attributes (e.g. #[allow(clippy::unchecked_without_comment)]).

2 Likes

As an example this is a function to create a new IRI but it is not checked if the given 'string' is a valid IRI according to the spec. However, it is not unsafe as an invalid IRI does not penetrate memory guarantees. I'm sure there are more examples of _unchecked-parsing that violate contracts but not memory safety.


Sry for the example from an WIP version of a crate but it's the project I'm currently contributing to and the reason for this topic in the first place.

I am likely going to add unchecked methods to construct values in the time crate at some point. Invalid data doesn't violate memory safety, so marking it unsafe would be unidiomatic.

2 Likes

There are many examples of invalid data being considered as undefined behaviour. The best-known example is String, which is literally defined like this:

pub struct String {
    vec: Vec<u8>,
}

Yet it is still considered undefined behaviour for it to contain data that is not valid utf-8.

1 Like

I think @jhpratt meant to say that in the time crate and in the functions he has in mind, invalid data doesn't violate memory safety.

Sure, but my point is that it wouldn't be unidiomatic for invalid data to be considered UB. It's about what kind of guarantees you give the user, and whether you want the user to be able to make those assumptions in their unsafe code.

I disagree. UB has a very specific meaning within rustc; it means that the compiler is permitted to miscompile the program in its quest for space- or time-performance optimization. Invalid data may be a program logic bug, but in most cases (other than the UTF-8 requirement for strings) it does not give the compiler that permission. In your own words, if you have UB sometimes you just get lucky.

Declaring on the documentation for a type that invalid data causes UB, can cause miscompilation if unsafe code in external crates relies on that documented guarantee. This is exactly the situation with strings, where the act of putting invalid utf-8 into a String doesn't immediately cause miscompilation, but various unicode algorithms might have e.g. unreachable_unchecked() in branches that are not reachable if the provided data really is valid utf-8.

Strings aren't some sort of exception to the rule here. If you didn't run into any such unicode algorithms, then you just got lucky.

1 Like

You're correct. I was considering removing the reference to UTF-8 from my post. UB in code triggers potential miscompilation, code without UB does not. Linking code modules that do not uphold the same invariants may cause incorrect program behavior, but if none of the modules contain UB then such linking, even if with LTO, should not result in any UB within the aggregate assemblage.

I definitely agree that potential violations of memory safety are required for it to be idiomatic to mark something as unsafe. The purpose of my post was to point out that the mere act of declaring it UB for the value to contain invalid data causes potential violations of memory safety, so it isn't unidiomatic to do so.

Of course, it would also be idiomatic to not mark it unsafe. It's all about the guarantees you make to other crates.

In fact, the reason why I pointed out that *_unchecked are not necessarily unsafe was a thread I opened before:

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.