Selecting a type with type arguments from numerous combinations at runtime?

A project I'm currently working on has a need to be able to search through text using several search modes, any (fixed) encoding width, and any endianness. To achieve this, I've settled on using a family of searcher structs that all implement the same trait, and take the encoding width and endianness as type parameters:

trait Searcher { /* … */ }

// `PrimInt` is from the `num-traits` crate; `ByteOrder` from the `byteorder` crate.
pub struct CodepointSearcher<N: PrimInt, E: ByteOrder> { /* … */}
impl<N: PrimInt, E: ByteOrder> Searcher for CodepointSearcher<N, E> { /* … */ }

pub struct RelativeSearcher<N: PrimInt, E: ByteOrder> { /* … */}
impl<N: PrimInt, E: ByteOrder> Searcher for RelativeSearcher<N, E> { /* … */ }

pub struct FormationSearcher<N: PrimInt, E: ByteOrder> { /* … */}
impl<N: PrimInt, E: ByteOrder> Searcher for FormationSearcher<N, E> { /* … */ }

I have to select between these at runtime, depending on the options the user picks – which I figure is most reasonable to handle using dynamic dispatch via a trait object. Were it a choice between a couple of types, you'd simply do this:

let polymorphic: Box<dyn Trait> = match option {
    Options::One => Box::new(TypeOne::new()) as Box<dyn Trait>,
    Options::Two => Box::new(TypeTwo::new()) as Box<dyn Trait>,
    Options::Three => Box::new(TypeThree::new()) as Box<dyn Trait>
};

Now, being a close-to-the-metal system programming language, Rust doesn't have types as first-class citizens – but imagining that it did, here's what I'd like to do:

let SearcherType = match mode {
    Mode::Codepoint => CodepointSearcher,
    Mode::Relative => RelativeSearcher,
    Mode::Formation => FormationSearcher
};

let IntType = match width {
    EncodingWidth::Bits8 => u8,
    EncodingWidth::Bits16 => u16,
    EncodingWidth::Bits32 => u32
};

let EndiannessType = match endianness {
    Endianness::Big => BigEndian,
    Endianness::Little => LittleEndian,
    Endianness::Native => NativeEndian
};

let searcher: Box<dyn Searcher> =
    SearcherType::<IntType, EndiannessType>::new(/* … */);

But that's not possible, of course, so you'd have to exhaustively specify every combination as a match arm (or use nested match statements, but the effect is the same):

let searcher = match (mode, width, endianness) {
    (Mode::Codepoint, EncodingWidth::Bits8, Endianness::Big) => Box::new(
        CodepointSearcher::<u8, BigEndian>::new(/* … */)
    ) as Box<dyn Searcher>,

    (Mode::Codepoint, EncodingWidth::Bits8, Endianness::Little) => Box::new(
        CodepointSearcher::<u8, LittleEndian>::new(/* … */)
    ) as Box<dyn Searcher>,

    // …
};

Here's the problem, however: Since I have three types and two sets of three possible type arguments, that works out to 3 × 3 × 3 = 27 different combinations, and… though I could type those out and sweep 'em under the rug into their own module, that's still decidedly anti-DRY.

Which brings me to my question:

  • Could one write a macro that generates this match statement? If so, how? It appears to potentially be possible to write a macro that generates match arms, but highly convoluted…
  • Or, perhaps preferably, is there a better way to select between a large number of type-type argument combinations?

(Here's a skeleton in Playground with these particular components: Link!)

I didn't look into the full details, but I think it is possible for a macro. the idea is relative simple (yet the macro might be tedious to write): generate the cartesian product of a predefined configuration space. as for the match arms, you can generate the match block as a whole in one expansion, but you cannot expand to some intermediate syntax where the match arms themselves are macro invocations.

here's an example to generate cartesian product of 3 dimensions (call them x, y, and z, respectively):

macro_rules! cartesian_x {
	( ($($x:pat)+); $y:tt; $z:tt;) => {
		cartesian_y!{
			($($x $y)+)
			$z
		}
	};
}
macro_rules! cartesian_y {
	( ($($x:pat ($($y:pat)+))+) $z:tt) => {
		cartesian_z! {
			($($x ($ ($y $z)+) )+)
		}
	};
}
macro_rules! cartesian_z {
	( ($($x:pat ($($y:pat ($($z:pat)+))+))+)) => {
		cartesian_result!{$($($(($x $y $z));+);+);+}
	};
}

an example expansion looks like this:

cartesian_x! {
	(0 1 2 );
	("hello" "world");
	(E::A E::B E::C);
}

// expands to this:
cartesian_result! {
	 (0 "hello" E::A);
	 (0 "hello" E::B);
	 (0 "hello" E::C);
	 (0 "world" E::A);
	 (0 "world" E::B);
	 (0 "world" E::C);
	 (1 "hello" E::A);
	 (1 "hello" E::B);
	 (1 "hello" E::C);
	 (1 "world" E::A);
	 (1 "world" E::B);
	 (1 "world" E::C);
	 (2 "hello" E::A);
	 (2 "hello" E::B);
	 (2 "hello" E::C);
	 (2 "world" E::A);
	 (2 "world" E::B);
	 (2 "world" E::C)
}

this is just an demonstration of the technique. as I already said, it might be really tedious (and boring) to write if you have many dimensions to work with.

it's just trade-offs.

if you want to only have one level of dynamic dispatch, then all the combinations of generic arguments must be monomorphized, but you have to deal with the huge combinatorial configuration space:

// this will explode for large number of configurations
match (cfg1, cfg2, ... cfgN) => {
  (C1::A, C2::A, ... CN::A) => {...},
  (C1::B, C2::A, ... CN::A) => {...},
  ...
}

if you want to avoid the combinatorial explosion, then you must do multple levels of type erasure (assuming the trait bounds are object-safe, at least most of them), which means there could be multiple levels of dynamic dispatch.

let t1 = match cfg1 {
  C1::A => {...},
  C1::B => {...},
  ...
}
let t2 = match cfg2 {
  C2::A => {...},
  C2::B => {...},
  ...
}
let searcher: Box<dyn Searcher> = SearcherType::<dyn IntType, dyn EndiannessType,...>::new(t1, t2);
1 Like

There is a way to do it piecewise without macros by providing a boxed interface to a struct with generics that implements a type-setting builder pattern. [playground]

pub trait BuildSearcher {
    fn with_mode(self: Box<Self>, mode: SearcherType) -> Box<dyn BuildSearcher>;
    fn with_int(self: Box<Self>, int: IntType) -> Box<dyn BuildSearcher>;
    fn with_end(self: Box<Self>, end: EndiannessType) -> Box<dyn BuildSearcher>;
}

impl<M: Mode, I: Int, E: End> BuildSearcher for SearcherGeneric<M, I, E> {
    fn with_mode(self: Box<Self>, mode: SearcherType) -> Box<dyn BuildSearcher> {
        match mode {
            // I didn't mock the inner types, but they would take the place of Unset
            SearcherType::Codepoint => Box::new(self.typed_with_mode(Unset)),
            SearcherType::Relative => Box::new(self.typed_with_mode(Unset)),
            SearcherType::Formation => Box::new(self.typed_with_mode(Unset)),
        }
    }
    fn with_int(self: Box<Self>, int: IntType) -> Box<dyn BuildSearcher> { todo!() }
    fn with_end(self: Box<Self>, end: EndiannessType) -> Box<dyn BuildSearcher> { todo!() }
}

The end-use remains quite clean:

let searcher = SearcherGeneric::new_boxed()
    .with_mode(mode_enum)
    .with_int(int_enum)
    .with_end(end_enum)

There's other bits you can do with it depending on your use. You can add a .finish() method that returns a result with the final interface version (if the generics are configured correctly). You can also just put the type-setting methods on the main interface if you can and want to be able to change the internals at runtime.

2 Likes

I don't quite follow as to how this approach would work…! When I ultimately have a SearcherGeneric with all three types filled in, how would I use it? As I understand it (and I may well have misunderstood it), SearcherGeneric would contain three separate objects of types that each implement SearchByteStream, PrimInt, and ByteOrder – but the code for the search mode (i.e. the code called by the methods defined in the trait SearchByteStream) need access to the PrimInt and ByteOrder types, which this doesn't seem to allow.

the builder type has the exact same generic types as the final type, it's just the builder implementation can be N1 + N2 + N3 different cases while a direct match has N1 * N2 * N3 cases, where Nx is the cardinality of each configuration axis, this is thanks to the generic implementation of the builder's type chaning methods:

for example, suppose we have two axises: Mode and Endian

struct Searcher<Mode, Endian> {...}
struct SearchBuilder<Mode, Endian> {..}

// direct construction need to match the product of `Mode` and `Endian`
// which have N1 * N2 different cases
match (mode, endian) {
    (Mode::M1, Endian::Big) => todo!(),
    (Mode::M1, Endian::Little) => todo!(),
    (Mode::M2, Endian::Big) => todo!(),
    (Mode::M2, Endian::Little) => todo!(),
    //...
    (Mode::Mx, Endian::Big) => todo!(),
    (Mode::Mx, Endian::Little) => todo!(),
}

// for the builder, when you change the `Mode`, you keep the `Endian` generic
impl<M, E> SearchBuilder<M, E> {
    fn with_mode(self, mode: M) -> Box<dyn...> {
        // this function need to match only `Mode`, but it is implemented on a generic builder type
        // so after monomorphization, the compiler generate an instance for each combination of the configuration
        match mode {
            Mode::M1 => Box::new(todo!()),
            Mode::M2 => Box::new(todo!()),
            //...
            Mode::Mx => Box::new(todo!()),
        }
    }
    // safe for the `Endian` case
}

and even better to this technique, if the final type can be default constructed (i.e. with zero argument), the builder type can be zero sized, and the Box<dyn BuilderTrait> doesn't allocate at all, all it does is to hold a vtable pointer.

fun fact: when a builder (or a factory, as some would prefer) is stateless, I call it "phantom builder" (or "phantom factory"), but I don't know if other people have given it a common name.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.