I suppose that is correct; I was more focused on the out of range behavior. I suppose you could make a weak argument for saturating_shl
, but that name's certainly not much better than wrapping_shl
. In fact; it's relying on the same sleight of hand, in that the shift amount is "saturating" to a full shift out, rather than talking about a quality of the shift itself.
Perhaps exhausting_shl
, since the bits are "exhausted" when you shift by N or more? Or flushing_shl
. I don't dislike unbounded_shl
either. any_shl
feels specifically like what the operators do, where you get a compiletime switch between behaviors (-Coverflow-checks
).
But on the other hand, truncating_shl
isn't wrong, either. It is consistently truncating the resultant shifted bitstring, no matter how much it's been shifted by. On the other other hand, that wrapping_shl
and overflowing_shl
are talking about the wrapping/overflowing of the shift amount makes a _ing_shl
talking about the operation rather than the amount questionable, even though that makes it match the other ops more closely.
Tongue-in-cheek suggestion: euler_shl
, for no reason other than in analogy to euler_div
being the "mathematically pure" choice of integer division semantics. I don't think Euler actually defined anything related to shifting fixed-width binary numbers. (Though with how much Euler did, it's at least possible.)
The counter reason I presumed it might not be is that release mode semantics for operators was IMHO intended to be a "what the hardware makes fast" choice, with the overflow condition occurring always being a bug. (If it was "reasonable default", overflowing shifts would flush to zero.) The problem is that there isn't just one answer for the "fast" shift, as you've noted, since the normal shift instruction truncates/wraps the shift amount first, but the SSE shift flushes to zero.
Aside
To that point, I suspect that if Rust ever gets a target for a machine which saturates by default instead of wrapping, while it probably won't become the default on that target, there's a somewhat low but possible chance that saturating will become a second option for the overflow-checks=off behavior of operators, alongside today's choice of wrapping. Same goes for if we get a target where overflow traps, giving us an abort option for overflow-checks. The likelihood of such a target becoming popular enough to motivate extensions to core semantics is on its own rather low, thanks to the mathematical conveniences of wrapping arithmetic on two's compliment integers, but the growing support of hardware-level sanitizing machines like CHERI make it a decidedly nonzero chance.
The choice does seem to matter for nonvectorized shifts, but essentially not matter for vectorized shifts in current LLVM [godbolt].
Worked example
Currently, for
pub fn wrapping_shl(lhs: u32, rhs: u32) -> u32 {
lhs.checked_shl(rhs % 32).unwrap()
}
pub fn saturating_shl(lhs: u32, rhs: u32) -> u32 {
lhs.checked_shl(rhs).unwrap_or(0)
}
LLVM uses shl
for both when targeting x86-64-v2, shlx
for x86-64-v3. This results in saturating_shl
being more machine code, requiring the cmp
and cmove
. This suggests that if overflow-checks=off means overflow should choose "what the hardware makes fast," at least for LLVM emitting x86_64 it does seem like the wrapping version is better.
My attempt at giving autovectorization the best chance
use std::simd::*;
pub fn wrapping_shl_x4(lhs: u32x4, rhs: u32x4) -> u32x4 {
let lhs = lhs.to_array();
let rhs = rhs.to_array();
let out = lhs.zip(rhs).map(|(lhs, rhs)| wrapping_shl(lhs, rhs));
u32x4::from_array(out)
}
pub fn saturating_shl_x4(lhs: u32x4, rhs: u32x4) -> u32x4 {
let lhs = lhs.to_array();
let rhs = rhs.to_array();
let out = lhs.zip(rhs).map(|(lhs, rhs)| saturating_shl(lhs, rhs));
u32x4::from_array(out)
}
pub fn vector_shl_x4(lhs: u32x4, rhs: u32x4) -> u32x4 {
lhs << rhs
}
results for both in the emitted code reading and shifting each element individually rather than using a vectorized operation. Only the version using std::simd
results in a vectorized shift in LLVM, and AIUI that's because it bottoms out in an intrinsic stating to do so (and, currently, wrapping the shift amount).
LLVM's IR only supports shifts where the shift amount is less than the bitwidth of the shifted type; wider shifts create poison
(deferred UB). This means that backend instselect is required to do the detection of code to wrap or flush with a wrap; it can't be done during the general optimization passes. However, in theory at least this shouldn't need to prevent autovectorization, as the vectorized version just directly does the non-vectorized operations on vectorized types.
Also, FWIW, by my attempt, both vectorized wrapping and flushing shifts seem to generate essentially identical assembly, with the wrapping version shorter by a single instruction. This does seem to indicate that it's not the semantics of shifts getting in the way of vectorization at all, but rather just limitations of the autovectorization pass not handling shifts. It may also be an opportunity for x86-64-v3 instselect to improve a bit, to take advantage of vpsllvd
's overlong shift semantics the way that the nonvectorized version takes advantage of shlx
's semantics to elide the source-level mask of the shift amount.
Yes, with some caveats.
std is distributed in release mode without overflow checks for most of the stdlib, so if it's internal most people wouldn't see the panic. Additionally, std uses #[rustc_inherit_overflow_checks]
a lot to not do that and get the compilation's overflow checks setting for the standard arithmetic operators; I'm not confident to say whether or not any stdlib functionality is doing so for shifts.
It's these caveats that make up the "are we willing to bet on that." std shouldn't be relying on the behavior of overlong shifts, but it does carry a risk that changing the behavior of the shift operator over editions could result in std visibly behaving differently based on the caller's edition. On the other hand, if we can securely ensure that std's code always gets the current edition behavior, even if it's inheriting overflow checks, then that's sufficient to mix editions like with any other edition change.
I'm not saying that it would cause problems, just that there's more potential for it to cause problems than other language-only edition changes.