Bit level types?


#1

A language meant to replace C should offer more correctness guarantees, and sometimes people have found ways to do it that surprise me. This paper shows a (quite cute) type system that works on spans of bits of integral numbers, that I think is able to catch some of the bugs of the kind of low-level code you see in C/C++/Rust/Ada:

http://goto.ucsd.edu/~rjhala/papers/bit_level_types_for_high_level_reasoning.html

The paper first of all proposes a type system to manage bit ranges and bit operations in a safer way. The paper use this short code example:

mget(u32 p) {
  if (p & 0x1 == 0) {
    error("permission");
  }
  pte = (p & 0xFFFFF000) >> 12;
  b = tab[pte] & 0xFFFFFFFC;
  o = p & 0xFFC;
  return m[(b + o) >> 2];
}

To represent the type of those variables they use sequences of {name,size} pairs (a semicolon means field with no name, often filled with zeros), like:

p:   {idx,20}{addr,10}{wr,1}{rd,1}
pte: {;,12}{idx,20}
b:   {addr,30}{;,2}
o:   {;,20}{addr,10}{;,2}

Once you have given a name to those bit spans you can think of using a field access syntax:

mget (p) {
  if (p.rd == 0) {
    error("permission");
  }
  pte.idx = p.idx;
  b.addr = tab[pte.idx].addr;
  o.addr = p.addr;
  return m[b.addr + o.addr];
}

Their paper also uses some type inference, so you don’t need to specify the spans manually. An example of the output of the type inferencer:

typedef u32 BOT_t;
typedef u32 k_t; typedef u32 s_t;
typedef u32 b_t; typedef u32 r_t;
struct __b0 { k_t k_1; BOT_t BOT_2; };
struct __b1 { BOT_t BOT_1; k_t k_2; };
struct __b2 { b_t b_1; r_t r_2; };
struct __b3 { b_t b_1; BOT_t BOT_2; };
struct __b4 { BOT_t BOT_1; b_t b_2; BOT_t BOT_3; };
struct __b5 { k_t k_1; b_t b_2; s_t s_3; };
u32 mget(struct __b5 p ) {
    struct __b0 p1; struct __b1 pte;
    struct __b2 b1, tab[1000];
    struct __b3 base, a; struct __b4 off;
    if (p.s_3 == 0) {
        error("Permission failure");
    } else {
        p1.k_1 = p.k_1; p1.BOT_2 = 0;
        pte.BOT_1 = 0; pte.k_2 = p1.k_1;
        b1.b_1 = tab[__b1_to_u32(pte)].b_1;
        b1.r_2 = tab[__b1_to_u32(pte)].r_2;
        base.b_1 = b1.b_1; base.BOT_2 = 0;
        off.BOT_1 = 0; off.b_2 = p.b_2; off.BOT_3 = 0;
        a.BOT_2 = 0; a.b_1 = base.b_1 + off.b_2;
        return m[__b3_to_u32(a)];
    }
}

From the paper:

The type inference algorithm constructs bitvector types for the variables and the translate procedure constructs C structures corresponding to these types, where each field is an unsigned 32 bit entry. The field BOT_n refers to a zero block in the type. For example, the structure struct b5 encodes the bitvector type {k,20}{b,11}{s,1} and struct b1 encodes the type {;12}{k,20}. The assignments are translated to structure assignments. For array indices, we convert the bitvector structure to the corresponding integer by a system-defined function (to allow array indexing by the structures).

So they use unannotated or lightly annotated C code, run the bit type inferencer on it, and verify it’s “correct”.

I guess you can implement the same type system with a rustc plug-in, and use it for regular Rust code. Later that plug-in could become part of rustc. But are the safety advantages of this unusual kind of type system worth the implementation efforts?

(Writing rustc plugins that add relatively simple type system extensions (like some of the features from Ada language) is something I’d eventually like to do).


#2

We could probably use the bitflags crate as an example, and actually do the whole thing with a macro that generates wrapper structs around the underlying bit-packed data structure.


#3

That’s for the data representation, and I think that’s the easy part. The paper is about the type system that verifies that the code is also correct. And type inference to minimize the annotations needed.


#4

The paper’s scheme guarantees that bitfields are treated discreetly. For example, in a bistruct like this:

bitstruct!{
    struct CameraState {
        flash_mode: 2,
        exposure: 4,
    }
}

Which would compile to this:

/// CameraState: {;,2}{flash_mode,2}{exposure,4}
#[derive(Clone, Copy)]
struct CameraState(u8);

impl CameraState {
    pub fn new(flash_mode: BitVec<2>, exposure: BitVec<4>) -> StateTuple {
        StateTuple(flash_mode.get() << 4 | exposure.get())
    }
    pub fn set_flash_mode(&mut self, x: BitVec<2>) { self.0 ^= (!x) << 4 }
    pub fn set_exposure(&mut self, x: BitVec<4>) { self.0 ^= (!x) & 0x0F }
    // truncate() would be the fastest way to do this.
    pub fn flash_mode(self) -> BitVex<2> { BitVec::truncate(self.0 >> 4) }
    pub fn exposure(self) -> BitVex<4> { BitVec::truncate(self.0) }
}

The paper guarantees (as does this Rust code) that any operating is performed on a single discreet bitfield at once (as opposed to problematic things like performing addition on the whole structure and mutating flash_mode and exposure both in a single operation).

Beyond that, the paper’s most interesting parts are the use of whole-program analysis to mine these structures from existing C programs, and applying formal methods to the resulting program. Whole-program type inference is not going to be added to Rust itself, but it wouldn’t be too big of a problem: just write a separate app on top of librustc (look at clippy for an example) that automatically converts a program off of manual bit-twiddling onto a “bitstruct” macro library. Formal verification is the same: rustc is not going to incorporate Blast, but would be happy to see that sort of thing done externally.


#5

But perhaps you can add some manual annotations to avoid the need of whole-program type inference.