Floating Rationals Proposal

Prompted by people always running into problems with the old bugbear of numerical analysis, that is floats truncation, viz equality, I would like to propose a novel representation of floating rationals. This will much exceed the accuracy of the current IEEE 754 floats standard, as well as entirely avoid many of the floats truncation induced inequality errors.

Essentially, it is "the best of both worlds" of combined rationals and floats, for a modest cost in extra memory and cpu cycles.

A rational typically consists of i64 and u64 (only one sign needed). To turn it into a floating rational, we only need to add, say two bytes exponent (16 bits, again saving one exponent, while exceeding the 11 bits exponent of IEEE754). The calculations will take roughly twice as long as a fast dedicated floating processor, though hardware support may also be possible in medium to long term. On the plus side, some operations like taking the reciprocals come totally free and division is entirely eliminated (replaced by rationals multiplication). Also, all powers of two factors from both numerator and denominator can be subsumed in the exponent, thus making more room in the rational's fields that will now hold odd numbers only.

The biggest win, in keeping with the Rust philosophy of safety, is that all calculations and comparisons are integer based and the classical truncating floats become entirely unnecessary.

Rationals also open up nice possibilities of using repeating continued fractions to represent square roots exactly.


Denominators easily blow up in exact rational arithmetics. (A typical example is Newton-Raphson method applied to calculate 1/√2.) I guess that is why people don't use fixed length rational numbers except in very specific usecases, like sorting integer vectors by their angles.


What exactly do you mean by "blow up"? Taking the worst case 1/N, I do not understand how 53+11 bits can "blow up" any less than 64+16 bits of my denominator. I believe that my proposal of adding the exponent to the rationals solves this and other problems.

Obviously, there will also be some precision limits, although the precision will be greater but at least they do not kick in for common trivial numbers like 1/3 and for comparisons.

I suspect that the problem @qnighy refers to is that multiplication of factors in numerators and denominators often rapidly overflows any modest fixed-size representation, whether u32 or u64 or u128 and its signed brethren. Common factor elimination cannot prevent such overflow.

Once overflow does occur, either the "rational" representation fails or it must be reduced to an inexact approximation. At that point a user might be better off switching to some form of interval arithmetic that preserves error bounds. Ball arithmetic seems a likely candidate.


I believe that this'd need to be in its own crate as a struct, rather than a language item, given its complexity and niche.

Also, having one of these 64 + 16 bits items would waste the next 48 bits due to alignment.

struct FloatingRational {
    lhs: u64, //Align 8, size 8
    exp: u16, //Align 2, size 2
} //Largest align is 8, therefore size must be a
  //multiple of 8. Your 8 + 2 byte structure has
  //now become 8 + 8 bytes. >eek<

In the case of an actual lang item (Such as in python? I think they have built in rationals) there would probably be optimizations in the long term. I still think this would be better off in a crate like the following:

struct FloatingRational<T: Num> {
    numerator: T,
    denominator: T,
    exp: T,

Yes a crate is what I had in mind initially. @TomP I am not addressing exact BigNum arithmetic here, nor interval arithmetic, though that is cool too and might be worth combining with in future. For a start, it is perfectly possible to introduce approximating fractions to avoid overflow. They will still be better approximations than the normal floats.

@OptimisticPeach - thank you for that valid alignment comment, at this stage I am not wedded to any particular numbers of bits and the best approach will no doubt be generic for various widths, respecting the alignments.

Note that the extra bits that alignment would otherwise waste could be used for the "ball" interval radius, with a value of 0 indicating that the value (i.e., interval) is exact.

1 Like

Yes, at least some kind of flag to indicate the transition would definitely be useful, so that the comparison operators could examine it. However, it would be better to have two different types and an explicit conversion between them.

Given the eight byte alignments on the 64bit machines, it may be better to err on the generous side and to go for the total of 24 bytes for the exact type, allocating more to that troublesome denominator.

1 Like

Are you aware of Posits (Unums)? See also the Posits: the good, the bad and the ugly review paper by Florent de Dinechin et al. They have their own community hub and an implementation in Rust.


Is it floating part that is problematic or floating point numbers using 2 as a base?

IEEE 754-2008 (published in 2008) defines decimal floating point numbers, which, I think, solve lot of the issues that are typically attributed to floating point arithmetics.

I even started my own pure Rust impl (there is one crate that wraps IBM decNumber library), but didn't get too far :frowning:

1 Like

@idubrov 10 is only slightly better base than 2, inasmuch as it adds the factor of 5 and thus, for example, the ability to represent exactly 1/5 but it still can not cope with the modest 1/3. In general, superabundant numbers are the best, as the wise ancients knew and therefore used 60.

I am now thinking along the lines of how to best transition from the exact float rational to the approximate one, when (usually the denominator) exceeds its number of allocated significant digits. This will typically happen in only one of the three primitive operations: addition, subtraction, multiplication. As such it can be trapped and an approximate type returned instead of the exact one. The equality operator will be valid only for the exact type.

In the approximate type the best approximation to the rational will be computed, keeping the significant bits of the denominator down.

I'm really feeling that introducing a type with the contract of "I am exact until some hard-to-tell point but then I become inexact suddenly and without notice" would be a mistake and very much against the core principles of Rust.

If users see "exact" (or even "rational") in the documentation, they'll inevitably start using it as if it were always-exact. This is a huge footgun, even worse than regular floating-point, because it advertises itself as "this is so nice you don't need to care", and then breaks down unexpectedly.

If one wants rationals without overflowing a fixed-size representation, one should use rationals composed of bigints. If even those are not good enough for a niche use case such as representing continued fractions, then a specialized library should be written for that. In any case, it definitely does not warrant a new language builtin.

1 Like

I agree, that is why I proposed separating the exact and inexact types, while maintaining some kind of compactness and efficiency, unlike with bigints. The exact type explicitly failing its exactitude is imho better than the float, where this occurs all the time without notice, for example, every time you divide by 3.

If you care for exactness, then you will take notice of the explicit "loss of exactness error" if/when it occurs. If you don't care, as with the current floats, then you can just recover from that error and carry on with approximate rationals, or use the approximate type from the start. I do not see the problem and to me it seems in keeping with the Rust philosophy of recoverable errors.

I accept that the idea of approximate rationals is somewhat novel and may need some getting used to but in principle it should be no more difficult to understand than the approximate floats. Think of them both as being approximate but the approximate rationals as better approximations.

While I think we should use exact (big-integer) rational numbers whenever possible, fixed-size efficient rational numbers cannot be implemented reasonably.

You want to calculate gcd of the denominators to calculate arithmetics. But gcd is not a cheap operation and hardware implementation won't help much (the best known parallel algorithm is Ω(n/log n) time for n-bit gcd).

You can do arithmetics without computing gcd by just not doing reductions. However, this is not a good idea when rounding is also used. By adding 1/3 up many times, it has to round to prevent overflow. However, it is the kind of weirdness that wanted to prevent that rounding occurring when true value can exactly be represented!

My conclusion is: you get either 1. slow due to gcd, or 2. messy arithmetics with another (worse) weirdness.


Good point. I have an idea for doing the gcd reductions in "spare time" using async/await mode. Or leaving them out and going straight to the finding of the nearest simple rational only when needed (for the approximate case) to prevent overflow. However, none of that will help much when the calculations are being done in an intense loop. I guess it is a good representation but the implementation can be slow, you are right.

That's why I suggested extending this proposal to a rational ball arithmetic. That way a zero/non-zero ball radius serves as the exact/inexact flag and no error trapping is required. Rather than checking for loss-of-exactness error on every computation, simply check the ball radius of the end result.

1 Like

The problem with the ball radius is that it grows even faster than that troublesome denominator.

The problem of the loss of rational exactness is in principle the same as overflow on int multiplication. Both have to be dealt with, though this one is more likely to occur even on simple additions.
It does not seem unreasonable to expect the same kind of self discipline on users of exact fixed size rationals as that taken for granted by users of integer arithmetic in general.

However, I am thinking more and more on the lines of using the novel float rationals consistently and from the start as floats. After all, as has been rightly said, those who insist on exactness should perhaps pay the price of the ticket and use the bigints. There will be no contract of expected exactness but sometimes you get it for free.

When a new operation would result in denominator overflow, it will, then and only then, efficiently normalise by carrying out a division and shifting the powers of 2 into the exponent in order to reset the denominator to 1.

Furthermore, unlike with the fuzzy blowing out "ball radius", these divisions will be isolatable and the exact error introduced will be known (as the remainder) and can be stored as such.

I don't think this is true, but even if I take it at face value, I think you're going to face a far bigger problem than the bare growth of the denominator: rounding.

Leaving aside the "floating" part for the moment, let's take a smaller example. Suppose you have 3 bits of (signed) numerator and 3 bits of (unsigned) denominator, and you do 1/3 + 1/4. The ideal result is 7/12, which doesn't fit. What rational number is the closest to 7/12?

If you're doing this math in binary you might see that 7/12 is 0111/1100, and "round" both numerator and denominator by discarding both lsbs. That does get you something relatively close. But that result isn't correctly rounded for rational arithmetic: 011/110 is 3/6 or 1/2, but the actual closest representable fraction is 3/5. What algorithm are you going to use to turn 1/3 + 1/4 into 3/5? Adding a 2-bit exponent just makes this more complex, not easier -- now you can represent 4/7, which is even closer to 7/12, but how are you going to figure that out on the fly?

I think you're trying to solve a problem that doesn't really exist, at least not to the extent you say it does. True, floating point numbers aren't a mathematical ideal, but they still have a lot of nice pragmatic properties. They can be ordered quickly. They're easy to round (which has a lot to do with being easy to order). It's easy to go from one floating-point number to the next highest or next lowest. Rationals don't have any of these qualities.

Floating-point binary numbers are honestly pretty great when dealing with numbers that are not exact, like physical measurements. Applications requiring exactitude, like accounting, usually fit integer or fixed-point arithmetic better than rational arithmetic. And applications that demand rational arithmetic, such as computer algebra systems, often require arbitrary precision. Leaving "floating rationals" to fairly niche applications. And niche applications are the most likely to have specialized requirements that aren't met by a one-size-fits-all solution. Most languages don't even provide arbitrary-precision rationals because that's already pretty niche.

I don't want to just be discouraging. If you think I'm wrong, and "floating rationals" are indeed practical for everyday use, I encourage you to make a crate out of it. It might be an interesting exercise in hardware design, even -- I bet there's a lot of sparsely explored space in rational ALUs.

If you're interested in what's been done in this space in the past, note that Perl 6 uses rational arithmetic by default, and falls back to floating-point when the fraction becomes unrepresentable (IIRC). Perhaps there are some ideas you could borrow from there.


There is an algorithm using continued fractions convergents that will give you the "best" nearest simpler fraction, though I would rather avoid the expense of it, as well as of GCD simplifications. That is why I propose carrying out the division and the rounding and saving the remainder as an error. The improvement vis-a-vis normal floats consists in having to do this only from time to time when an overflow threatens, not every time you divide by, for example, 3. Thus the error accumulation will be smaller. Nor is my model going to abandon these benefits on the first possible overflow.

I agree about the practicality of the floats, that is why my stated aim from the start is "the best of both worlds". These floating rationals are convertible to floats "at the drop of a hat" by carrying out one floating division.

PS. Yes, I do intend to write a crate for this, I believe there is enough "meat" in this to make it worthwhile.
PPS. The "niche" of this will be all numerical analysis, numerical modelling, approximations, etc.