How to dynamically choose the correct variant of a statically polymorphic implementation?

Given the following Rust code:

trait StrategyA {}
trait StrategyB {}
trait StrategyC {}

struct A1;
struct A2;
struct B1;
struct B2;
struct C1;
struct C2;

impl StrategyA for A1 {}
impl StrategyA for A2 {}
impl StrategyB for B1 {}
impl StrategyB for B2 {}
impl StrategyC for C1 {}
impl StrategyC for C2 {}

fn algorithm<A: StrategyA, B: StrategyB, C: StrategyC>(input: i32) -> i32 {
    // ...
    input
}

fn main() {
    // Let the user decide which trait implementations to use
    let args: Vec<String> = std::env::args().collect();
    let a: i32 = args.get(1).map(String::as_str).unwrap_or("1").parse().unwrap();
    let b: i32 = args.get(2).map(String::as_str).unwrap_or("1").parse().unwrap();
    let c: i32 = args.get(3).map(String::as_str).unwrap_or("1").parse().unwrap();
    let input = args.get(4).map(String::as_str).unwrap_or("0").parse().unwrap();

    // How to do this better?
    let result = match (a, b, c) {
        (1, 1, 1) => algorithm::<A1, B1, C1>(input),
        (1, 1, 2) => algorithm::<A1, B1, C2>(input),
        (1, 2, 1) => algorithm::<A1, B2, C1>(input),
        (1, 2, 2) => algorithm::<A1, B2, C2>(input),
        (2, 1, 1) => algorithm::<A2, B1, C1>(input),
        (2, 1, 2) => algorithm::<A2, B1, C2>(input),
        (2, 2, 1) => algorithm::<A2, B2, C1>(input),
        (2, 2, 2) => algorithm::<A2, B2, C2>(input),
        _ => panic!("Wrong inputs {a} {b} {c}"),
    };

    println!("{result}");
}

(playground)

Under the constraint that the function algorithm cannot be altered (i.e. no dynamic dispatch can be used within), what is the best way to select the right combination of type parameters?

The way I have used here needs one line of code for each combination of strategies, which quickly explodes when there are more strategies with more implementations involved.

My feeling is that there is no way to get around the big match statement, other than generating it with a macro. Is that true?

And, if it is true, is there a crate that contains such a macro? I tried googling but was not successful.

(I am aware that this will create potentially very large binaries that take a long time to compile, and this is fine for me)

But can it be used outside the function? If your strategies are object-safe, then dyn Strategy: Strategy holds. So you could instantiate a value of each strategy for each parameter separately, and put them behind a dyn Strategy, then call algorithm with those. This would then only result in a single monomorphization of algorithm::<dyn StrategyA, dyn StrategyB, dyn StrategyC>(), and separate monomorphizations of the strategies, without requiring combinations of them.

If monomorphization of all combinations is required, then the combinatorial explosion will happen anyway. It's not your or the macro's fault. If you really need all combinations of the function to manifest, then the compiler has to codegen them all, regardless of how they are represented syntactically.

What is this algorithm, and where does it come from, by the way?

2 Likes

That is interesting, I did not know you could instantiate a function with a dyn type. But in that case I did not express myself well enough, since I actually wanted to use static polymorphism for performance reasons. The strategies may have only single (or very few) machine instructions in their functions, and are called in a tight loop, so adding an indirection via a vtable would likely increase runtime.

So, the goal is to instantiate algorithm with static types, such that within algorithm no dynamic types are used. But, of course, dynamic types can be used, as long as the end result is a call to algorithm with static types.

Makes sense. I will likely offer only a few "good" combinations of types in the end, but for figuring those out I need to have all the different variants anyways.

It is Dijkstra's shortest path algorithm, optimised for a certain bioinformatics use case. A many-to-many shortest path query in a very large graph where only the closest target is reported for each source (greedy variant), or the length of the resulting paths is limited by a small integer (optimal variant). The algorithm is run in parallel, by running a one-to-many search for each source, and distributing the sources to the different threads. The fact that the graph is very large induces a high memory usage when running on many cores, since the distance array of Dijkstra's algorithm needs to exist separately for each core. We currently have a project where we want to decrease that memory consumption and therefore want to swap out the distance array with a different data structure. And since we are already at it, we also plan to swap out the priority queue.

The code of the project is here: GitHub - algbio/matchtigs: Minimum plain text representation of kmer sets
The Dijkstra is in a very messy repo right now, but I put the code here as well: abstract-datastructures-rs/mod.rs at main · sebschmi/abstract-datastructures-rs · GitHub

The publication of the base algorithm is here: https://www.biorxiv.org/content/10.1101/2021.12.15.472871.full

This project is not the first time I came across this "dynamic instantiation problem" though, so I am very interested in a better solution.

2 Likes

Advice for the future: the question should usually be "why not?" when it comes to types. Generally, Rust's type system is very uniform. You can treat all types as equal e.g. when it comes to instantiating generics with them. There is no reason why an fn foo<T: Display> shouldn't work with a primitive, a struct, an enum, a library type, a built-in array, a dyn Display, or anything that conforms to the constraints declared in the function interface.

In this case, there really is no way around having every combination instantiated. If you want to choose an arbitrary combination at runtime (based on a CLI flag), then all concrete functions should have already been compiled into the binary beforehand.


I can think of a significantly more complex, but potentially more elegant and easier-to-scale approach. Could you build a JIT in your program? The way I imagine this would work is:

  • Create the strategies independently, without worrying about combinations. Make them #[no_mangle] extern "C", so that the
  • Try to come up with a minimal, performance-critical core of the algorithm which needs to be generic over strategies. Make the rest of the code (as much as possible) strategy-agnostic and performance-insensitive. This strategy-agnostic part should call out to an extern "C" function which will be the JITed realization of the concretely selected combination of strategies.
  • Based on the selected strategies at runtime, assemble the minimal core "by hand", likely using a high-level IR of whatever JIT system you end up using. This should not be very complicated, since it will only entail calling out to one of the possible strategy functions already compiled.
  • Hand the now-specialized core to the JIT, obtain a function pointer, and possibly cache the result (if you need to run it several times within the lifetime of the process). Rely on the JIT to inline and optimize the particular combination of strategies.
  • Invoke the full algorithm, passing the core as a function pointer. This will involve an indirect call, although by construction of the core and the rest of the algorithm, this should not be detrimental to performance at this point.

I'm not sure whether it's worth exploring this possibility, since it's probably a lot of work, but once done, it can scale up trivially without exponentially exploding the binary size.

2 Likes

Well, there's the implicit Sized bound to deal with; I'm pretty sure you'd need fn foo<T: Display + ?Sized> instead.

Well, yes, sure – what I was trying to say is that it's not fundamentally impossible to treat many kinds of types, dyn Trait included, uniformly in a certain sense.

1 Like

Here's one option

fn algorithm_a<A: StrategyA>(b: i32, c: i32, input: i32) -> i32 {
    match b {
        1 => algorithm_b::<A, B1>(c, input),
        2 => algorithm_b::<A, B2>(c, input),
        _ => panic!(),
    }
}

fn algorithm_b<A: StrategyA, B: StrategyB>(c: i32, input: i32) -> i32 {
    match c {
        1 => algorithm::<A, B, C1>(input),
        2 => algorithm::<A, B, C2>(input),
        _ => panic!(),
    }
}

fn algorithm<A: StrategyA, B: StrategyB, C: StrategyC>(input: i32) -> i32 {
    // ...
    input
}

fn main() {
    // ...
    let result = match a {
        1 => algorithm_a::<A1>(b, c, input),
        2 => algorithm_a::<A2>(b, c, input),
        _ => panic!(),
    };
}
4 Likes

Thanks, that sounds like a pretty cool approach. I think it would be nice as a crate, such that one can derive this whole functionality from a function, e.g.:

#[derive(jit_instantiation)]
#[jit_instantiation(A: A1, A2)]
#[jit_instantiation(B: B1, B2)]
#[jit_instantiation(C: C1, C2)]
fn algorithm<A: StrategyA, B: StrategyB, C: StrategyC>(input: i32) -> i32 { /* ... */ input }

would result in a function:

fn algorithm_jit(jit_cache: &dyn JitCache, a: &str, b: &str, c: &str, input: i32) -> i32 { /* ... */ }

that can be used as:

fn main() { algorithm_jit(&dyn None, "A1", "B2", "C1", 5) }

This function would do everything you described in the background, and the programmer can just use it. Hm, but I think already making a version that just works for one specific problem sounds hard, and is unfortunately out of reach for me at the moment, time-wise.

That looks exactly like what I could use! Thank you very much :slight_smile:

1 Like

That's pretty nice, PFA/currying to the rescue! It does scale better with respect to the number of strategies (and adding an additional one). It still ends up instantiating every combination in the binary, of course.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.