Correctness, integral overflows


#1

When I translate C/C++ code to D language, often D spots out-of-array-bounds errors in the original code, because in D I use slices and dynamic arrays, that in debug builds test indexes are within bounds.

When I translate C/C++/Java/D code to Rust, I keep finding integral overflow bugs, beside the out-of-array-bounds errors. So Rust is awesome.

I’m sure when even stricter languages/type systems appear in some easy to use language (like the Refinement types for TypeScript: Refinement types for TypeScript (and Typed Racket) [OT] ) I’ll find even more bugs while I port the Rust code to that language :slight_smile: (Unless those improvements get added to Rust itself).

One significant problem left in Rust related to integral operations comes from casts:

let x: u64 = ...;
let y = x as u32;

By default the “as” doesn’t perform a run-time overflow test, and this has caused some bugs in my Rust code. So I hope to eventually have some nice way to safely cast in Rust/Prelude/Rust Standard library.

Beside the safe casts, I’d also like a warning (active by default) that warns me when I am using an “as” when a safer “T::from()” is enough:

#[allow(unused_variables)]
fn main() {
    let x: u32 = 10;
    let y: u64 = x as u64; // Missing warning: use u64::from instead
    let y = u64::from(x);
}

#2

I seem to obsess a little over integral numbers because they are the base for a system language. If a system language doesn’t handle integral numbers well, then the foundations of the language will be fragile. Rust handles integral numbers well (and overall better than D) but D handles them better than Rust in some parts (Value Range Analysis and few other bits), and Ada language handles integral numbers better than both Rust and D in some other ways (static preconditions, ranged values, iterable enums, strongly typed array indexes, and more).

This is a small Java program (from http://d.hatena.ne.jp/jeneshicc/20090117 ), that solves Euler Probem #177:

public class P177{
    public static void main(String[] args){
        int count=0;
        double[] sin = new double[180],cos = new double[180];
        for(int i=0; i<180; sin[i]=Math.sin(Math.toRadians(i)),cos[i]=Math.cos(Math.toRadians(i++)));
        for(int alpha = 2;alpha<=90;alpha++)
            for(int diagB = 1; diagB< 180-alpha ; diagB++){
                double ad = sin[diagB]/sin[alpha+diagB];
                for(int beta = Math.max(alpha,diagB+1); beta <= 180-alpha ; beta++)
                    for(int diagA = beta==alpha?diagB:1 ; diagA < Math.min(alpha,180-beta); diagA++){
                        double ac = sin[beta]/sin[beta+diagA];
                        double x = Math.toDegrees(Math.atan2(sin[diagA]*ac-sin[alpha]*ad,cos[diagA]*ac-cos[alpha]*ad));
                        int rx = (int)Math.round(x);
                        int delta = 180-alpha+rx, gamma = 180-beta-rx;
                        if(gamma < alpha) break;
                        if(delta < beta) continue;
                        if((beta==delta)&&(diagA > alpha/2)) continue;
                        if((gamma==alpha)&&(diagB > beta/2)) continue;
                        if(Math.abs(x-rx) < 1.0E-10)count++;
                    }
            }
        System.out.println(count);
    }
}

This is my translation to Rust:

#![feature(inclusive_range_syntax)]

use std::cmp::{max, min};

fn main() {
    const N: i32 = 180;
    let mut sin = [0f64; N as usize];
    let mut cos = [0f64; N as usize];

    for i in 0 .. N as usize {
        let (s, c) = (i as f64).to_radians().sin_cos();
        sin[i] = s;
        cos[i] = c;
    }

    let mut result = 0;

    for a in 2 ... 90 {
        for b in 1 .. N - a {
            let c = sin[b as usize] / sin[(a + b) as usize];
            for d in max(a, b + 1) ... N - a {
                for f in if d == a { b } else { 1 } .. min(a, N - d) {
                    let e = sin[d as usize] / sin[(d + f) as usize];
                    let x = (sin[f as usize] * e - sin[a as usize] * c).atan2(
                             cos[f as usize] * e - cos[a as usize] * c).to_degrees();
                    let y = x.round() as i32;
                    let g = N - a + y;
                    let h = N - d - y;
                    if h < a { break; }
                    if g < d { continue; }
                    if (d == g) && (f > a / 2) { continue; }
                    if (h == a) && (b > d / 2) { continue; }
                    if (x - (y as f64)).abs() < 1.0E-10 {
                        result += 1;
                    }
                }
            }
        }
    }

    assert_eq!(result, 129_325);
}

The Rust version has some advantages over the Java code:

  • The cycles are more DRY and with less noise;
  • The two arrays are on the stack;
  • All the variables that don’t need to mutate are constant. In pratice only the two arrays are mutable, and they are mutable because they are fixed in size and initializing immutable fixed size arrays is less handy. If you want to build Vecs, then making them immutable is easy;
  • Conditionals are more clean, and the {} are required, this makes the code more regular and avoids some troubles and bugs;
  • This Rust code is probably faster than the Java code (but I have not done benchmarks on this computer);
  • Instead of using library functions like Math.sin(x), the Rust cose uses number methods, like x.sin(). I like this more;
  • sin_cos() could be more efficient than the Java code, unless the JavaVM does an optimization. But in this program the sin and cos functions are called only few hundred times, to it makes no difference;
  • The … and … syntax for loops is nice (but perhaps a more visible syntax like …> is better for closed intervals);
  • Rust doesn’t need the triple operator, its design is more DRY;
  • Beside the unsafe casts, this Rust code verifies that integral operations don’t overflow. This is very impotant to help make sure this kind of code is correct;
  • The Rust code is much more strict regarding types and their mixing, this avoids the risks you have in D language when you mix signed and unsigned types, and several other kind of troubles.

The Rust code has some disadvantages/problems too:

  • It contains 14 (!) hard casts, this is awful. Some hard casts can be dangerous. The Java code contains just one hard cast. The type strictness of Rust helps avoid some bugs, but having lots of cats could be less safe than relying on “safe casts are implicit, most unsafe casts can’t be implicit” as in D language.

  • The computation of the “g” variables requires signed integers (unless you modify the algorithm?), so I’ve used a i32. But of course such numbers can’t be used as array indexes in Rust. You can also do the opposite, defining N as usize, but this causes a cascade of other troubles in this code (and the computation of “g” requires more casts), so in the end I think my translation is the best, but other better translations with a different configuration of casts could be created.

  • I generally like the syntax of x.sin(), and I’d like the infix a.max(b), but I don’t like a lot x.atan2(y).

  • This code looks silly in a modern language:

    let (s, c) = (i as f64).to_radians().sin_cos();
    sin[i] = s;
    cos[i] = c;

I’d like to write something like this, as in Python, a single line:

(sin[i], cos[i]) = (i as f64).to_radians().sin_cos();

Or something similar with a mutable zipping of sin and cos arrays.

A very good language should offer the programmer a way to remove all of those casts.
An acceptable language should offer ways to remove most of them (all but one), and automatically implement the few remaining ones with safe run-time tests in debug builds. I am fine with such acceptable language.

To help face the problem the language should have both the concept of type (u8, u16, i32, u128, etc. Despite I still think names like u1, u2, s4, s16 are better, s = signed, u = unsigned, and the number is the number of bytes instead of the number of bits) and the concept of range of a value, so the code:

const N: i32 = 180;
let mut sin = [0f64; N as usize];
let mut cos = [0f64; N as usize];

Should require no hard casts. Because the range of N is statically known, it’s a single compile-time constant value, and it’s acceptable as array length. So no hard casts should be necessary, “safe casts” should suffice (a safe cast is like u64::from(), it’s usable where the compiler is statically certain a cast loses no information, so the value ranges are compatible, but still the programmer is required to state the desire to chage type):

const N: i32 = 180;
let mut sin = [0f64; from(N)];
let mut cos = [0f64; from(N)];

Here the range of a is [2, 91), so it’s still OK as array index, despite its type is i32 (and the compiler could also remove the bound tests even in debug builds, because the indexes are in bound. The LLVM is probably able to remove some of those tests in release builds):

for a in 2 ... 90 {
    for b in 1 .. N - a {
        let c = sin[b as usize] / sin[(a + b) as usize];

==>

for a in 2 ... 90 {
    for b in 1 .. N - a {
        let c = sin[from(b)] / sin[from(a + b)];

An exellent language of the future should prove this operation gives no run-time overflow, but this is past what I’m asking for Rust, so it’s OK to have a run-time overflow test in debug builds:

let g = N - a + y;

This should require no hard cast, because the range of the i values is fully representable in an f64:

let (s, c) = (i as f64).to_radians().sin_cos();

An excellent language from the future should be able to help the programmer prove this is safe, but this is more complex, and I can live with having this single hard cast in the code (as in the Java code), perhaps implemented with a run-time validity test:

let y = x.round() as i32;

This line is bad, and a better written program should test the significant bits of the FP number (the D standard library has a function for this, with the ugly name feqrel(): http://dlang.org/phobos/std_math.html#.feqrel , but I don’t remember if the Rust std lib has it):

if (x - (y as f64)).abs() < 1.0E-10 {

At this point you haven’t even started to prove that the code is actually correct, that there are no run-time overflows, that the code is a very efficient way to solve this Euler problem, and so on. But this is better left for other languages, proof assistants, etc.

In this code the length of sin is statically known:

const N: i32 = 180;
let mut sin = [0f64; N as usize];
let mut cos = [0f64; N as usize];

So D language allows me to write code equivalent to Rust code:

let mut sin = [0f64; 180];
let mut cos = [0f64; sin.len()];

In D language sin.len() is resolved as compile-time constant (well, in D language you can also define them in a single line: double[180] sin, cos;).

So Rust handling of integral numbers on average is an improvement over the D language, but I think there are some parts that could be improved. I am not asking for breaking changes in Rust. So how to improve the situation?

  • Add warnings for few situations, like when you use a hard cast that can be replaced by soft casts;
  • Introduce Value Range Analysis;
  • Introduce language or Prelude or standard library function/feature to perform more soft casts, and to perform run-time verified hard casts;
  • usize could be 32 or 64 bits, and this information should allow some more safe conversions;
  • len() should return a compile-time constant for fixed size arrays, just like size_of();
  • Structs probably should allow associated constants;
  • Slice length value range analysis should help spot some bugs statically, and to remove some run-time tests.

I think all such things can be done withour breaking changes in Rust, without requiring to add a SAT solver to the Rust type inferencer, without major additions of magic. This can’t solve all problems with handling of integral values (and slices) in Rust, but I think it should suffice avoiding many pratical problems in code. So when you review some Rust code, you have time to think about the actual significant potential integral value problems of your code, avoiding the need to think if a silly hard cast like this is actually safe and correct:

const N: i32 = 180;
let mut sin = [0f64; N as usize];