Feedback requested on Shr for negative BigInt

I would like community feedback on this issue:
https://github.com/rust-num/num/issues/347

The gist is that Shr for primitive integers always rounds down, but Shr for BigInt is rounding toward zero, which means it gives different results when shifting negative values. This behavior has existed ever since BigInt was first added to the standard library in 2012!

BigInt::from_biguint(self.sign, self.data >> *rhs)

I doubt that was intended to behave differently than negative i32, for instance -- it's just the most obvious way to implement Shr in this case. Rounding down will require some conditional fixups.

But I think matching primitive behavior makes the most sense across the board, so I'd like to just change the behavior and call it a bug fix, without a semver bump. Would anybody object to this?

3 Likes

I don't use num directly, so my input should be taken with a grain of salt, but this seems like a big enough change in behavior to warrant a major bump.

3 Likes

I'd prefer shr to use the expected behavior with two's complement binary. Currently I'm emulating it myself using this code:

    fn signed_bitwise_op<Op>(&mut self, n1: BigInt, n2: BigInt, t: usize, f: Op)
        where Op: FnOnce(BigUint, BigUint) -> BigUint
    {
        let n1_b = n1.to_signed_bytes_le();
        let n2_b = n2.to_signed_bytes_le();

        let u_n1 = BigUint::from_bytes_le(&n1_b);
        let u_n2 = BigUint::from_bytes_le(&n2_b);

        let result = BigInt::from_signed_bytes_le(&f(u_n1, u_n2).to_bytes_le());

        self.interms[t - 1] = Number::Integer(result);
    }

I couldn't find a more efficient way of doing it. There's a lot of needless copying going on.

1 Like

If I understand correctly, a right-shift through that signed_bitwise_op will give you a logical shift, no? Or do you also preserve the sign bit yourself? Are there other ops that you emulate this way?

There is BigInt::from_biguint(Sign, BigUint) already, but we could probably add a 2's-complement-signed version of this. Then in the other direction, I'm surprised we don't have something that returns (Sign, BigUint), but we could add that and another that outputs a 2's-complement-signed BigUint. (All of this regardless of whether we change Shr.)

It is a quiet breaking change, just saying. The point of semver is not to be pretty, but to be actually meaningful. By all means, feel free to release a major version.

Also, due to the way num is split into multiple crates, most types in num 0.1 and num 0.2 would actually be compatible, the only exception would be num-bigint types. Just bump major release of num and num-bigint, this should reduce the impact on other crates.

1 Like

Yes, you are right. I wanted an arithmetic shift, but just found that shifting -1 by 1 gives 0, not -1.. so I am getting a logical shift.

The thing that terrifies me about a server bump for num is that it's often a public dependency, with its traits exposed in the public interface of other crates. I really don't want a "numpocalypse" akin to libc's 0.1 to 0.2.

Maybe I could separate a semver bump for just the num-bigint sub crate, but the num::BigInt re-export wouldn't be able to follow it.

Once again, the impact of bumping num wouldn't be as big as it appears. Let's say that only num and num-bigint would be bumped.

Let's say that I'm using a crate that depends on num 0.1 and implements num::Zero for some of its type. My program is using num 0.2, and tries to use that trait...

And it works. Why? Because num 0.1 and num 0.2 both depend on num-traits 0.1. It's the exact same trait.

That said, ideally libraries would reexport any crate or parts they use they consider fully as part of their public API, but unfortunately that's not always the case :frowning:.

Hmm. You're probably right, and I may even be able to use the semver-trick to keep BigUint compatible. Although if I do bump num-bigint, I may reach for some other breaking changes too, like hiding big_digit as an implementation detail.

I should also take this time to say, I could really use help maintaining num! I take very careful care of it, since I know it is so widely used (since long before I had anything to do with it), so I would want someone who will share that attitude. But the reality is that I just don't make much time to actually improve it.

1 Like

I'd suggest doing whatever Python is doing. They have a native big integer type.

It should obviously behave like i32/i64.

This happened due to the poor choice of representing BigInt as (BigUint, Sign) rather than using the normal 2-complement representation (which in addition to issues like this is obviously bad for performance since you need to branch on the sign bit on every operation).

This is part of the bigger issue of num-bigint being implemented pretty terribly (it looks like it was written with the goal of providing bignums with minimal effort and code, rather than providing the fastest, lowest memory overhead and most expressive library).

Just to mention a few of the most critical issues:

  1. No bignum type with small bignum optimization
  2. Bignums use 32-bit units even on 64-bit machines and fail to take advantage of "add with carry" instructions, probably resulting in halving the performance
  3. No SIMD usage (another 2-4x performance loss at least...)
  4. No way to have "borrowed" bignums, only an owned vector of BigDigits

This also holds for the num crate as a whole: e.g. Ratio implements comparison by recursive division (!) rather than taking the time to add a trait for extending multiplication and using that like any sensible implementation would do (this is likely 10-100x slower in case it's not obvious).

Long-term, a rewrite from scratch is required.

5 Likes

What are you picturing for this? Ref/Mut (or Slice/SliceMut) types make sense to me for types like SoA vecs where it seems reasonable to want to have a type with the same ownership semantics as &mut [T] (contiguous subset, mutation without reallocation...), but it wouldn't seem so useful for BigInt where even AddAssign can reallocate.

Rather than just a list of grievances, I would be more receptive to PRs, especially those that don't break API. I am just one maintainer, who didn't even write most of this code in the first place.

Or if you want to do a whole rewrite that breaks API, you're welcome to create your own crates, of course.

4 Likes

If you go for a two's complement behavior, IMO you should go all the way, and implement all bitwise operations for it, removing the need for a separate unsigned type. E.g. in this case, (n & -1) == n, because -1 is conceptually represented as an infinite string of one-bits. Using two's complement representation for one operation and sign-magnitude for all the rest would be inconsistent.

1 Like

Alternatively, to avoid semver problems, you could just add another, explicitly two's-complement-based type, and keep the old sign-magnitude types untouched, but deprecated.

Sharing a layout between the two would be nice (like Vec<u8> and String), but I think there's still value in two different types, for things like 4 - 10.

for things like 4 - 10.

What would be the expected behavior in this case?

panic!() would make sense, like 4_u32 - 10.

It doesn't seem necessary to have two separate types just for the sake of builtin assert!(x >= 0).