Is i32 / always round towards 0?

#[test]
pub fn test_00() {
    println!(
        "{:?} {:?} {:?} {:?}",
        7_i32 / 2,
        7_i32 / (-2),
        (-7_i32) / 2,
        (-7_i32) / (-2));}

returns 3, -3, -3, 3

Question: is guaranteed that i32 / always rounds towards 0? If so, where is this stated ?

Yes. Integer division is truncating. This is standard practice (and how the CPU actually works).

2 Likes

And if you're wondering why, it's because it's the behaviour that means the usual -x / y ≑ -(x / y) ≑ x / -y holds.

(Well, it would for away-from-zero too, but that's a very uncommon mode to want -- for positive numerator and denominator, you want it to round toward zero so the usual "remainder" behaviour is what you get. People don't expect "eleven divided by ten" to have remainder negative-nine.)

4 Likes

It's documented in the reference here (first footnote under the table of operators).

6 Likes

It's standard practice in modern day CPUs, but not really standard practice anywhere else. If you ask a mathematician what (-13) mod 10 is, they will almost universally say 7. Knuth also defines div and mod as flooring in all his books.

In Python also (-13) % 10 is 7, and (-13) // 10 is -2. So even in programming truncating is not universally standard practice.

It looks like the reason it became standard in CPUs goes back to Fortran, and then C did it the same way to match Fortran and the CPUs.

However, for power-of-2 divisors, actually truncating division / remainder is slower even on current CPUs than flooring division / remainder, because flooring div / rem corresponds to simple bit shifts and bit masks:

That's an argument, but basically in every other way truncating behavior is inferior to flooring or euclidean division / modulo.

First of all, in 99% of cases what you really want from x % 10 is a number between 0 and 9. Imagine using this to index an array, or to load balance requests, or to calculate the day of the week, etc, etc. It works with flooring division, not with truncating division.

Now imagine you're scaling down an image by a factor of 10 using x / 10. Every output pixel corresponds to 10 input pixels. Except, with truncating division, one output pixel with coordinate 0 corresponds to 19 input pixels. This actually was a bug in some Adobe software, where fonts got distorted by 1 pixel below a certain line because of this. No such problem with flooring division.

6 Likes

Well, it's the obvious argument from a chip designer perspective. Division is hardβ„’, so by taking that rule means you only have to implement unsigned division "for real", and you do a comparatively-trivial abs before hand and conditionally negate it after. (And those fixups are completely trivial on sign-magnitude machines -- the output sign bit is just an XOR of the input signs, just like in multiplication.)

TBH, I think the real problem is that integer division with negative denominators was considered worth having. I've never heard a single situation where people actually did that intentionally. But if it's going to exist, x / -y meaning -(x / y) seems like about as reasonable as a definition as anything else, since it's such a weird operation.

1 Like

Here is a paper comparing the three definitions: https://dl.acm.org/doi/pdf/10.1145/128861.128862

It basically argues Euclidean division is best, flooring division second best, and truncating is the worst. I agree with this.

I also agree negative divisors are not very useful, and Euclidean and flooring definitions are same for positive divisors, so the first two are almost equivalent in practice.

If a user actually cares about negative numerators, then what they are getting most of the time as the result of this design choice is: a slightly faster operation, but giving a "wrong" answer (i.e. not what they actually want). That's not a great trade-off. So as the result of this, they have to deal with it and patch it up in software, which ends up being both less efficient, and error-prone.

I think the adjustments required are very easy compared to the cost of performing the whole division algorithm. Maybe it's not zero cost, but it's small. I'm not even sure it would require adding any more cycles in hardware: after all, it's negating and sometimes offsetting by one. Negating a number in two's complement is already a bitwise not and offsetting by one. I suspect it could all be pretty streamlined hardware, perhaps at roughly the same cost.

In any case, I'd much prefer a programming language did that, even if the CPU doesn't directly support it in hardware and it requires adding a couple assembly instructions after signed division. If I want the most efficient division, I can always use unsigned operands.

Also like I said, the current situation is reversed if the denominator is known to be a power of 2. Flooring is easier. If you do x / 2 (a very common scenario), the generated assembly first does a floor division by a bit shift, and then patches it up to simulate truncating division by offsetting it by 1 for odd negative numerators. That's pretty perverse.

Same thing if you do x % 2 for signed x. It will first mask off the bottom bit (which is what you really want 99.9% of the time), and then it will add assembly instructions to make sure you get -1 rather than 1 for negative x (which I have never seen any practical application for).

Fortunately there is i32::div_euclid and i32::rem_euclid if you want better behavior for negative numbers.

Somehow these don't get optimized very well however. For instance, x.rem_euclid(2) is equivalent to x & 1, but the generated code is much more complicated for some reason:

That may well be true, but it's not uncommon in silicon. Rust has 1_u64 >> 64 β‡’ 1, which isn't what I ever wanted either, because just ignoring the high bits of the shift amount is the simplest implementation.

1 Like

Isn't that UB?

Edit: ah, technically not

I find this also wrong and not forward looking. If there is some CPU (if not now, then in 10 years) that can actually shift properly by 64 bits, Rust will still have to emulate this wart on that CPU. Already different CPUs handle this case differently, so this is a very weird choice of behavior by Rust.

BTW is this documented? I can't find this in the reference.

On x86-64 implementing the correct behavior would be same cost in the majority of cases (when the shift is by a number of bits constant or statically known to be below 64), or at most 2 extra assembly instructions in those cases where it's not statically known.

In my own code I have already had to add that special case (for shift by exactly 64 bits) manually at least twice, so I still have to pay that cost of those extra two instructions.

1 Like

Some additional exampeles

It's not documented anywhere in the Reference or in the Shr trait, but the wrapping_shr() methods all note this:

Panic-free bitwise shift-right; yields self >> mask(rhs), where mask removes any high-order bits of rhs that would cause the shift to exceed the bitwidth of the type.

(Ditto for wrapping_shl().)

  1. This sounds really bad.
  2. What does this really mean ?

I tried running the following tests, and got a compile error:

#[test]
fn test_00() {
    println!("1 >> 2 = {:?}", 1_u64 >> 2);
    println!("1 >> 64 = {:?}", 1_u64 >> 64);}

That's because the compiler will evaluate trivial constant expressions at compile time and it encountered an error so it fails the build (you get the same thing with let x = 1 / 0).

You can see the real effect when moving the shift to a function.

fn main() {
    shift(1, 2);
    shift(1, 64);
}

fn shift(lhs: u64, rhs: u64) {
    let shifted = lhs >> rhs;
    println!("{lhs} >> {rhs} = {shifted} ({shifted:#b})");
}

(playground)

This panics on debug mode:

--- Standard Error ---
   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.63s
     Running `target/debug/playground`
thread 'main' panicked at 'attempt to shift right with overflow', src/main.rs:7:19
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

--- Standard Output ---
1 >> 2 = 0 (0b0)

But it works as "expected" in release mode.

--- Standard Error ---
   Compiling playground v0.0.1 (/playground)
    Finished release [optimized] target(s) in 0.97s
     Running `target/release/playground`

--- Standard Output---
1 >> 2 = 0 (0b0)
1 >> 64 = 1 (0b1)

If you do a shift-right on a u64, it means the CPU only reads the last log2(64) = 6 bits of your right-hand side.

For example,

0xdeadbeef_u32 >> 8 = 0xdeadbe
0xdeadbeef_u32 >> (32 + 8) = 0xdeadbe
0xdeadbeef_u32 >> (40 & 0b11111)  = 0xdeadbe
4 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.