Trying to understand arithmetic overflow

I'm trying to understand how Rust handles int types and when arithmetic overflow occurs.

Here's code to do a powmod function as an example, to understand when overflow occurs.

fn powmod(b : u128, e : u128, m : u128) -> u128 {
  let (mut r, mut b, mut e) = (1, b % m, e);
  while e > 0 {
    if e & 1 == 1 { r = (r * b) % m }
    e >>= 1;
    b = (b * b) % m;

fn main() {
  let mut b = 329832983;
  let mut e = 4843;
  let mut m = 498422;
  println!("{}", powmod(b,e,m) );

  b = 329832983;
  e = 4433923223;
  m = 583472714045;
  println!("{}", powmod(b,e,m) );

  b = 3298329804304383423;
  e = 4433923224384234;
  m = 2372095723823498422;
  println!("{}", powmod(b,e,m) );

  b = 902892829485298529845229485244241;
  e = 543853278717814148138741341734140;
  m = 39685272471287524758245784582458245485;
  println!("{}", powmod(b,e,m) );

This code compiles with no errors, but the last group gives an overflow, though all fit in u128.

thread 'main' panicked at 'attempt to multiply with overflow',
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Apparently it overflows when the (b * b) is performed.

I assumed when that is performed the result is stored in a tmp variable of twice the bit size,
and then brought back down with the (% m) operation into a u128 size again. Apparently that's not what's happening.

Could someone explain how to do this correctly?
(I'm trying to understand, so don't reference a crate that does this)

The result of multiplying two u128's isn't magically stored in a temporary variable of twice the size, it's stored in a u128 and if the multiplication overflows it overflows.


OK, that is what seems to have happened.
So it's designed to only do single-precision multiplications.
Is there a native way to hold temp double-precision results when followed by a modulo?
(I suspect not).

You might want to use the rug crate for this.

Thanks for the suggestion, and I'll follow up.

I was just testing Rust to see if it's internal design handled this.

Doing a modulo-multiplication (x * y) % m when all the variables are the same type is so common I was hoping this case was natively accommodated.

I'm not doing arbitrary precision here. I just want to multiply values of the same type and derive a result modulo that type, to make it be as fast as possible. Using arbitrary size libraries like GMP slows things down when the variables are not BigInts than using math for those given int sizes.

It would be nice then to design a something like a *% primitive to internally do this for the given internal integer types.

What sizes do you care about? You can use Barrett or Montgomery reduction pretty easily for u64, less so for u128. I think stuff like this doesn't belong in the language because the implementation space is so broad.

No, no, I understand Rust is reluctant to add anything more than the bare minimum in its core.

I was just saying it would be a simple matter to just do double-precision mults in this case before a modulo, because you know the result must be within the range of the modulo int size.

So if you could do something like (x *% y) % m, the compiler would know to do a double-precision mult followed by the modulo reduction for whatever the integer type of m is. Again, allowing this would prevent having to use arbitrary math size libraries when you know your numbers don't get that large.

But my question was answered, so thanks for the responses.

Well, in a sense u128 is already double-precision compared to the native word size (on 64-bit) or even quad-precision (on 32-bit systems), so for 64-bit integers you can cast to u/i128, do the multiply and remainder, and then cast back to u/i64. But there's no way to go wider than 128 bits in the standard library.


See for the currently-nightly-only methods that are being explored for adding to core to help wider arithmetic.


If you don't want to use an external crate, you'll just have to write this yourself. I'm pretty sure that this code is correct, but might be overly slow:

type WORD = u8;
const HW:u32 = WORD::BITS / 2;

const fn lo(x:WORD)->WORD { x & ((!0) >> HW) }
const fn hi(x:WORD)->WORD { x >> HW }

const fn mod_add(a:WORD, b:WORD, m:WORD)->WORD {
    assert!(m > 0);
    let (mut lo, carry) = (a % m).overflowing_add(b % m);
    lo %= m;
    let carry = if carry { ((WORD::MAX % m) + 1) % m } else {0};
    match lo.overflowing_add(carry) {
        (x, false) => x % m,
        (x, true) => x.wrapping_sub(m)

const fn bit_mod_group<const N:usize>(m:WORD)->[WORD; N] {
    let mut out = [0; N];
    let mut i=1;
    while i < N {
        out[i] = mod_add(out[i-1], out[i-1], m);
        i += 1;

const fn mod_mul(a:WORD, b:WORD, m:WORD) -> WORD {
    let mut out = 0;
    let l = lo(a) * lo(b);
    let m1 = lo(a) * hi(b);
    let m2 = hi(a) * lo(b);
    let h = hi(a) * hi(b);
    let groups = bit_mod_group::<{4 * HW as usize}>(m);
    let mut i = 0;
    while i < 2 * HW {
        if (l  >> i) & 1 != 0 { out = mod_add(out, groups[        i  as usize], m); }
        if (m1 >> i) & 1 != 0 { out = mod_add(out, groups[(  HW + i) as usize], m); }
        if (m2 >> i) & 1 != 0 { out = mod_add(out, groups[(  HW + i) as usize], m); }
        if (h  >> i) & 1 != 0 { out = mod_add(out, groups[(2*HW + i) as usize], m); }
        i += 1;

const PRIMES:[u8;54] = [
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67,
    71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149,
    151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229,
    233, 239, 241, 251,

fn test_mod_group() {
    for m in PRIMES {
        for x in PRIMES {
            for y in PRIMES {
                assert_eq!((x,y,m,(x as u32 * y as u32) % m as u32), (x,y,m,mod_mul(x,y,m) as u32));

TBH, I thought this would be easier to implement than it actually was...


From one of the suggestions from what I'd
really like is a mulmod(x, y, m) = (x * y) % m native method that accurately computes it.

A generic version would do (x * y) to double-precision of the largest int type of x|y.
Then it would be modulo reduced to the int type of m.

As was stated in the above issue, this is a core operation of many larger algorithms, and is frequently done within the inner loops of many algorithms (like Miller-Rabin test). To have this done optimally (and accurately) to handle math involving standard int types (not BigInts) would substantially speed up these algorithms, by not having to always resort to using BigInts, and make Rust natively|inherently more conducive to number theory and computationally intense math fields and algorithms.

In LLVM, rust can't do anything better than you can already do by casting to a larger type, doing the operation, and casting back to the smaller type.

(LLVM doesn't have a "wide multiply" operator -- it just encodes it as zero-extend, multiply, truncate.)

Just being able to do that natively in a core method would still be good.

If x|y are limited to u64 values, they can be recast as u128 in (x * y) and a single-precision mult would be sufficient. That result would be accurate, and then modulo reduced to the int type of m. This would still be very useful over many finite fields, and again, can be combined into hybrid operations that can switch to using arbitrary size math (BigInts) only when necessary.

If you're going to be performing lots of modular arithmetic and the modulus will be known at compile-time, I'd suggest defining a type that incorporates the modulus in its definition. Something along these lines:

struct ModI64<const M:u64>(u64);

impl<const M:u64> From<u128> for ModI64<M> {
    fn from(x:u128)->Self {
        ModI64((x % (M as u128)) as u64)

impl<const M:u64> From<u128> for ModI64<M> {
    fn from(x:u128)->Self {
        ModI64((x % (M as u128)) as u64)

impl<const M:u64> From<u64> for ModI64<M> {
    fn from(x:u64)->Self {
        ModI64(x % M)

impl<const M:u64> std::ops::Mul for ModI64<M> {
    type Output = Self;
    fn mul(self, rhs:Self)->Self {

        Self::from((self.0 as u128) * (rhs.0 as u128))

impl<const M:u64> std::ops::MulAssign for ModI64<M> {
    fn mul_assign(&mut self, x:Self) { *self = *self * x; }

impl<const M:u64> std::ops::Add for ModI64<M> {
    type Output = Self;
    fn add(self, rhs:Self)->Self {

        Self::from((self.0 as u128) + (rhs.0 as u128))

impl<const M:u64> std::ops::Sub for ModI64<M> {
    type Output = Self;
    fn sub(self, rhs:Self)->Self {

        Self::from((self.0 as u128) + M as u128 - (rhs.0 as u128))

impl<const M:u64> ModI64<M> 
    pub fn pow(mut self, mut e:u128)->Self {
        let mut r = Self::from(0);
        while e > 0 {
            if e & 1 == 1 { r *= self; }
            e >> 1;
            self *= self;

    fn hint_range(self) {
        if self.0 >= M {
            unsafe { unreachable_unchecked() }

At the bottom of this wikipedia article on modular arithmetic it gives examples in C for doing fast modular multiplication. It shows algorithms for using UInt64s but can probably be also used for UInt128. Maybe it will helpful in conceptualizing an approach conducive to using LLVM.

This argument sounds a tiny bit bizzare to me. It sounds as if we are discussing BASIC interpreter or Python where something implementable on the language itself is inherently slow and couldn't be made as fast as something implemented as primitive.

But Rust is not like that at all! Literally everything implementable in code can be implemented in a user-provided program (using inline assembler if every other approach doesn't give enough flexibility).

Do you have any evidence that this mulmod operation is, for some reason, an exception to that rule and can not be implemented in a library instead?

That's certainly a noble goal but adding just one randomly picked primitive to the language is hardly a solution.

Doing computationally intense math fields and algorithms in Rust is PITA right now but lack of proper mulmod syntax is just one tiny problem out of many.

If you are serious about fixed that the IRLO is the right place, if your complaint is “Rust doesn't suit my usecase and I want someone else to change it to make it better for me” then the answer would be more of “sure, let's add it to the backlog, maybe in next ten or twenty years someone would do that”.