Basically I can't picture a scenario in which you have to match over each individual value.
To me your code example is highly contrived because I cannot imagine a case where I've ever needed to look at the n
th element of a vector of unknown type read from a file. It also clearly doesn't reflect your own usage because it doesn't typecheck. (x
has different types in different branches!)
Basically, if different files have different types in them, then I can only imagine that I would want to do different things to them! (there are some exceptions I'll cover at the end)
Just this Thursday, I had to write a text-based parser of a similar format, and I had no trouble parsing it into an enum of vecs up front. Here is an adaption of my code to your binary format. (Note this requires nom
3.2.1 or lesser, because I have no idea how to use many0!
in nom 4).
Data types:
use nom::*;
#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)]
enum TypeTag { Integer, Real, Complex }
#[derive(Debug, Clone, PartialEq)]
pub enum Data {
Integer(Vec<i32>),
Real(Vec<f64>),
Complex(Vec<(f64, f64)>),
}
Helper parsers:
If you're not familiar with nom
, named!{fn_name<I, O>, ...}
defines a parsing function that takes I
(either &[u8]
or &str
) as input and produces some Result<(I, O)>
with the parsed value and the unparsed remainder.
named!{integer<&[u8], i32>, i32!(Endianness::Big)}
named!{real<&[u8], f64>, map!(u64!(Endianness::Big), f64::from_bits)}
named!{complex<&[u8], (f64, f64)>, pair!(real, real)}
named!{
type_tag<&[u8], TypeTag>,
switch!(
i32!(Endianness::Big),
0x0000_0001 => value!(TypeTag::Integer)
| 0x0000_0002 => value!(TypeTag::Real)
| 0x0000_0003 => value!(TypeTag::Complex)
)
}
Main parser:
many0!(parser)
repeatedly applies a parser and returns a Vec of results. So here, we parse a TypeTag, and:
- If it's
Integer
, we use many0!(integer)
to parse a Vec<i32>
, then put it in Data::Integer
,
- If it's
Real
, we use many0!(real)
to parse a Vec<f64>
, then put it in Data::Real
,
- etc.
named!{
file<&[u8], Data>,
terminated!(
switch!(
type_tag,
TypeTag::Integer => map!(many0!(integer), Data::Integer)
| TypeTag::Real => map!(many0!(real), Data::Real)
| TypeTag::Complex => map!(many0!(complex), Data::Complex)
),
eof!()
)
}
Test:
#[test]
fn test() {
const INT_TEST: &'static [u8] = b"\
\x00\x00\x00\x01\
\x00\x00\x00\x04\
\x00\x00\x00\x10\
";
assert_eq!(
file(INT_TEST).unwrap().1,
Data::Integer(vec![4, 16]),
);
}
Now, there are a few exceptions to what I said at the beginning of this post. E.g. It's possible I might want to get the len
of the inside Vec without knowing what type it is.
I generally try to avoid these situations, but if a large number of them pop up and I have no alternative, I do have a technique for dealing with the boilerplate. Basically, the goal is to implement four functions as_ref
, as_mut
, map
and fold
that together serve 99% of use cases.
I'll write something up about this technique next, even if only to give myself something to link to from other threads. I'll warn you though: it can be pretty costly in terms of syntax, so it's only useful if you have an extremely large number of variants.