Is adding faster, `unsafe` API to a crate a benefit?

I have this PR to once_cell, that adds a unsafe fn get_unchecked(&self) -> &T method, which skips initialization check. Note that, as of now, API of once_cell is fully safe.

Looking at the assembly proves that this get_unchecked method indeed can help with eliminating an atomic load, which can't be eliminated by using get_unchecked at the call site. So there's no doubt that there are cases where get_unchecked will be helpful.

So, adding this method can help some use-cases.

On the other hand, if I have this method, it could be misued. Roughly, if there are 1000 users of the crate, and one of them really needs get_unchecked, there's some chance than one of the rest of the users will incorrectly use unsafe. The math here is that the number of people for whom this method would be useful is much smaller than the total number of users of once_cell.

Should I, as a crate author, police the users and just don't provide the unsafe method? Or should I consider misuse of unsafe to be of the marginal probability, and lean towards providing maximal performance?

3 Likes

Slice's get_unchecked could also be misused, but of course that's why these things are unsafe. If there's a legitimate use case, I think it's fine to provide this.

9 Likes

This is just my opinions and I'm no API expert :smiley:

I don't think that's a very good idea. If some users really want the feature they'll just use a fork.

As @cuviper already said, that's what unsafe is for. As long as you define your function unsafe and clearly explain the contract, it's not your responsibility if users misuse your library.

3 Likes

One minor thing I'd like to add: the addition of an unsafe API should always come from necessity (i.e. the surface is too low level to be used completely safely) or measured expressive or performance benefit.

If the unsafe surface area doesn't add performance or expressiveness, but is rather just for snake oil premature optimization (or even pessimization, in some cases!), then it probably shouldn't be added, rather directing users to the safe API.

In this case, though, there is a clear and measurable benefit to the unchecked version of the API, so it's worth adding.

2 Likes

If you are concerned about giving this "for free", you can always feature-gate them:

[dependencies]
once-cell = { version = "...", features = "unsafe_API" } # or "expert_mode"

This way those who really want these "optimizations" can opt into having them, whereas by default an "unconcious" user will get a compilation error :woman_shrugging:


Imho the best policy is good documentation about when using this function is sound, and when it may seem to be despite it not being the case. Add some warning sign emojis if you need :stuck_out_tongue:

1 Like

This is approximately my litmus test:

  • A valid use case in real code (not a hypothetical) needs to be presented.
  • There should be no way to achieve the use case with the use of safe code.
  • There exists some benchmark somewhere that justifies the use of unsafe.

If the API addition is a single unsafe function, then I think the above is generally enough for me. The docs for the function should carefully explain the safety invariants that the caller needs to uphold, and perhaps provide an example (from the use case you were given) where using the unsafe API is appropriate.

For a simple addition like this, I wouldn't worry too much about unsafe misuses. People are certainly going to abuse/misuse unsafe, and keeping it out of the API of a crate or two isn't going to change that. Instead, I think we should be pushing back against unsafe misuses with polite but relentless education and code review. That is, make correct use of unsafe part of our ecosystem's culture.

5 Likes

I’d love to see real code where eliminating an acquire load + very predictable branch shows non-trivial perf improvement :slight_smile:. Even with a plain load, if it’s a cache miss, the atomic is in the noise compared to the cache miss. If it’s cache hitting, then the atomic load will also hit the cache. There may be some potential optimizations lost due to limited code motion across the barrier, but again, would love to see real code experiencing that.

If it were me, I wouldn’t add the unsafe api, not yet at least. You can always do it later, when a real case presents itself (as others mentioned); taking the API away later is much harder as the cat’s out of the bag.

1 Like

Is this rationale true even on CPUs where Acquire requires a memory barrier like Power and ARM?

(Not a trick question, I never had to optimize code for those architectures yet.)

Me neither so hard for me to say. Certainly cache misses dominating would likely be the bigger issue on them as well. In terms of the cpu fences themselves contributing, I’m not sure - maybe someone with experience tuning on those archs can shed some light.

Across all the archs, however, I’d expect possible degradation only in tight loop situations. But if one has a tight loop, why not read the value out of the cell once, take whatever hit, and then use a normal reference inside the loop? That’s sort of my reasoning here.