Design Question: How to handle lots of heterogeneous type parameters?


#1

I’ve been working on a Rust implementation of the file command here: https://github.com/ahwatts/file-magic-rs, and have been having trouble figuring out how to represent its tests.

The quick summary about how file works is that it has a bunch of rules that boil down to hopping to an offset in the file, read some data from it, and compare that data it to a “magic” value. The data types it needs to be able to read and compare essentially run the gamut of primitive types: 1-8 byte signed and unsigned integers, 32 and 64 bit floats, byte strings, and various date types which are mostly just differently-interpreted integers. It also specifies the endianness of the data to read.

The C implementation of file compiles these rule lists into a binary form that it can read quickly without re-parsing the rule files. I’ve been trying to make sure that all the structures that represent the tests can be serialized to disk to be comparable to that.

My struct representing one of these rules is:

#[derive(Clone, Debug, PartialEq, RustcEncodable, RustcDecodable)]
pub struct MagicEntry {
    pub filename: String,
    pub line_num: usize,
    pub level: u32,
    pub offset: Offset,
    pub test: Test,
    pub message: String,
}

Everything’s public for the moment for convenience. I don’t necessarily plan on keeping it that way. The part I’m having trouble with is the Test structure:

#[derive(Clone, Debug, PartialEq, RustcEncodable, RustcDecodable)]
pub enum Test {
    AlwaysTrue,
    Number(NumericTest),
}

#[derive(Clone, Debug, PartialEq, RustcEncodable, RustcDecodable)]
pub struct NumericTest {
    pub endian: Endian,
    pub logic_op: NumOp,
    pub test_value: NumericValue,
}

#[derive(Clone, Copy, Debug, PartialEq, Eq, RustcEncodable, RustcDecodable)]
pub enum Endian {
    Little,
    Big,
    Native,
    Pdp11,
}

#[derive(Clone, Copy, Debug, PartialEq, Eq, RustcEncodable, RustcDecodable)]
pub enum NumOp {
    Equal,
    LessThan,
    GreaterThan,
    NotEqual,
 }

#[derive(Clone, Debug, PartialEq, RustcEncodable, RustcDecodable)]
pub enum NumericValue {
    UByte(u8), UShort(u16), ULong(u32), UQuad(u64),
    SByte(i8), SShort(i16), SLong(i32), SQuad(i64),
}

This sort of works, but all the enums make for a lot of awkward and repetitive match statements. It would be nice to be able to turn Test into a trait and store a Box<Test> on the MagicEntry. Then the enum variants could be structs in their own right, I could template the NumericTest and implement the other test types as their own structures. This would essentially turn all those match constructs into dynamic dispatch.

But I can’t seem to find a way of doing that that still allows the traitified Test construct be object-safe. In particular, requiring Test to implement rustc_serialize's Decodable and Encodable traits makes Test not be object-safe.

Does anyone have a suggestion on how I can represent this situation in DRY-er way?

Some things I’ve thought about:

  • For the NumericTest case, storing the test value as a num::BigInt and then casting it to do the comparison.
  • Pulling the Endian and data-reading code out into a separate structure. Then I have to make sure when doing the comparison that the data type read from the file is the same as what we’re comparing it to.