Varying Generic Parameters with features

All good! I love that we can truly discuss this in detail! :heart: I will probably answer again tomorrow. Thanks for your input. I really do appreciate it.

This doesn't sound like a problem that feature flags are really intended to solve. Without an example of how the API currently looks for that, I'm sort of taking a shot in the dark but it is possible to model something like this without feature flags

#![allow(unused)]

// Your library code
trait IntracellularReaction {
    fn reaction_value(&self) -> Option<f64>;
}

struct Disabled;

impl IntracellularReaction for Disabled {
    fn reaction_value(&self) -> Option<f64> {
        None
    }
}

impl IntracellularReaction for f64 {
    fn reaction_value(&self) -> Option<f64> {
        Some(*self)
    }
}

trait SimulationConfig {
    type IntracellularReaction: IntracellularReaction;

    fn reaction_threshold(&self) -> Self::IntracellularReaction;
}

fn start<S: SimulationConfig>(config: S) {
    // If impls of `IntracellularReaction` always return either Some or None for a given type, the compiler will probably be able to optimize this check out since we're using generics and not trait objects here.
    if let Some(threshold) = config.reaction_threshold().reaction_value() {
        // intracellular reactions enabled
        println!("{threshold}")
    } else {
        // intracellular reactions disabled
        println!("Disabled!")
    }
}

/// Macro to make it less of a hassle to disabled a feature you don't use. Extremely not necessary.
macro_rules! disable_intracellular {
    () => {
        type IntracellularReaction = $crate::Disabled;

        fn reaction_threshold(&self) -> Self::IntracellularReaction {
            $crate::Disabled
        }
    };
}

// Client code

struct ReactionSimulation {
    reaction_threshold: f64,
}

impl SimulationConfig for ReactionSimulation {
    type IntracellularReaction = f64;

    fn reaction_threshold(&self) -> Self::IntracellularReaction {
        self.reaction_threshold
    }
}

struct NoReactionSimulation;

impl SimulationConfig for NoReactionSimulation {
    disable_intracellular!();
}

fn main() {
    start(ReactionSimulation {
        reaction_threshold: 12.0,
    });
    start(NoReactionSimulation);
}

By using an associated type, you can ensure that clients are consistent with values they give you (i.e. if another API needs a an intracellular reaction value, you can use the associated type to ensure the value passed is Disabled and not an f64.)

The tradeoff here is that the APIs still exist and in some cases users may have to pass Disabled in places where ideally there wouldn't be a value at all for the capabilities of your library that user cares about. That's unfortunate, but I think less of a burden than the feature flag version would put on you. Ideally you could provide "disabled" defaults for everything, and users that wanted to use one of the disabled capabilities would override the default, but Rust doesn't have great ways to do that kind of thing right now, except in extremely simple cases.

I believe this is also closer to what @kpreid was suggesting, where the client code defines a struct implementing a trait that your library defines

I can show you one of my Traits which I am using at the moment to model intracellular reactions. Everything you are seeing is possible to change in the future, but so far it has been working well for me.

pub trait CellularReactions<ConcVecIntracellular, ConcVecExtracellular = ConcVecIntracellular> {
    fn get_intracellular(&self) -> ConcVecIntracellular;
    fn set_intracellular(&mut self, concentration_vector: ConcVecIntracellular);
    fn calculate_intra_and_extracellular_reaction_increment(
        &self,
        internal_concentration_vector: &ConcVecIntracellular,
        #[cfg(feature = "fluid_mechanics")] external_concentration_vector: &ConcVecExtracellular,
    ) -> Result<(ConcVecIntracellular, ConcVecExtracellular), CalcError>;
}

The second generic parameter is used when Extracellular Reactions also take place and their type differs from Intracellular Reactions. As you can see, the trait itself does not contain any bounds on its generic types. These are later introduced as needed by the backend. However, there is a summary of required implementations present such that users can make sure beforehand to implement them.

A simple possible implementation can look like the following. In this example the feature = "fluid_mechanics" is active and thus we are coupling intracellular reactions to extracellular ones.

use nalgebra::SVector;
use serde::{Deserialize, Serialize};

pub type ReactionVector = nalgebra::SVector<f64, 3>;

#[derive(Serialize, Deserialize, Clone, Debug)]
pub struct OwnReactions {
    pub intracellular_concentrations: ReactionVector,
    pub turnover_rate: ReactionVector,
    pub production_term: ReactionVector,
    pub degradation_rate: ReactionVector,
    pub secretion_rate: ReactionVector,
    pub uptake_rate: ReactionVector,
}

impl CellularReactions<ReactionVector> for OwnReactions {
    fn calculate_intra_and_extracellular_reaction_increment(
        &self,
        internal_concentration_vector: &ReactionVector,
        external_concentration_vector: &ReactionVector,
    ) -> Result<(ReactionVector, ReactionVector), CalcError> {
        let mut increment_extracellular = ReactionVector::zero();
        let mut increment_intracellular = ReactionVector::zero();

        for i in 0..NUMBER_OF_REACTION_COMPONENTS {
            let uptake = self.uptake_rate[i] * external_concentration_vector[i];
            let secretion = self.secretion_rate[i] * internal_concentration_vector[i];
            increment_extracellular[i] = secretion - uptake;
            increment_intracellular[i] = self.production_term[i]
                - increment_extracellular[i]
                - self.turnover_rate[i] * internal_concentration_vector[i];
        }

        Ok((increment_intracellular, increment_extracellular))
    }

    fn get_intracellular(&self) -> ReactionVector {
        self.intracellular_concentrations
    }

    fn set_intracellular(&mut self, concentration_vector: ReactionVector) {
        self.intracellular_concentrations = concentration_vector;
    }
}

If you are interested in my project, you can see the source code on Github (see below). But I am partly embarrassed by my current state of code quality. Although I have a working system which can run even large Simulations, I am planning on reworking considerable parts of my code-base. That is also a byproduct of the discussion here. :pray:

My thought process was the following:

  • I want to provide a library which requires different amounts of detail (implemented traits) depending on how detailed my overall model is
  • If particular details are not needed, the user should not be able to interface with them in any way
  • This is why I thought that conditional compilation (in Rust via feature flags) should be able to solve my problem

In principle this is what I am doing too at the moment. However as I was pointing out in one of my previous posts (see quote below) it does make a difference (computationally and from a model-design perspective) whether I allow users to use a particular trait at all or simply leave it out alltogether.

Your approach in defining a "default" behaviour which does nothing is only valid if I have multiple cell-types and some are not participating in Intracellular Reactions while other are. Then I would be required to activate the feature but provide an implementation which simply does nothing and this is already implemented in my current setup.

I wasn't subscribed to the thread, so I didn't see the last two responses until just now. @jpleyer were you able to address your concerns in a different way than what was described here, or did you have anything else to follow up with?

Seeing actual code, it suggests that the calculate_intra_and_extracellular_reaction_increment method is being overloaded. In the case that there are no intracellular reactions, even the method's name is subtly incorrect. (At least, it appears to be incorrect to include intra in the method name even when that property is explicitly defined to not exist. I could be wrong. Cellular automata is not my domain of expertise! Or biology, for that matter.)

I suspect there is a reason that the generic type is split (instead of letting the caller use a tuple for a single type, for instance). But maybe that's worth exploring?

For the sake of argument, let's suppose that f64 and f32 need to be mixed. (For reasons.) And we can show that writing the implementation is not the hard part:

// lib.rs
pub trait CellularReactions<CellType> {
    fn calculate_reaction_increment(
        &self,
        concentration_vector: &CellType,
    ) -> Result<CellType, CalcError>;
}

// main.rs
pub type ReactionVectorF64 = nalgebra::SVector<f64, 3>;
pub type ReactionVectorF32 = nalgebra::SVector<f32, 3>;

// This is the important bit! The `CellType` generic is a tuple
// (or struct, or whatever) that is used as both input and output for the trait.
pub type F64andF32CellType = (ReactionVectorF64, ReactionVectorF32);

pub struct F64andF32Reactions {
    pub intracellular_concentrations: ReactionVectorF64,
    pub extracellular_concentrations: ReactionVectorF32,
}

impl CellularReactions<F64andF32CellType> for F64andF32Reactions {
    fn calculate_reaction_increment(
        &self,
        concentration_vector: &F64andF32CellType,
    ) -> Result<F64andF32CellType, CalcError> {
        let mut increment_intracellular = ReactionVectorF64::zero();
        let mut increment_extracellular = ReactionVectorF32::zero();

        // Do important work here ...

        Ok((intra_result, extra_result))
    }
}

As I understand it, this is what the following comment was describing:

The caller needs to be aware of the concrete type (F64andF32CellType) to even call the method, but that just means the type parameter extends into other interfaces. That's pretty normal. Eventually you get to the root and it's just:

fn main() {
    let sim = Simulation::<F64andF32CellType>::default();
    sim.run_with_iterations(100_000);

    println!("Simulation results: {sim:#?}");
}

To me, the problem sounds like something which could potentially be solved by the Null object pattern - Wikipedia. You could say that 0 is a special case of this for numbers.

Sorry to taking so long for answering. My current plan is to address the problems by exposing a generic interface Simulation<Intracellular, Extracellular, ...> and implementing special behavior for default types Intracellular = () such that nothing is being done and hoping that the optimizer will throw out this code. While this is not completely what I was aiming for, it is enough at this point. I am making this choice under the impression that all other options are simply not viable in terms of complexity or maintenance. I am also planning to not use features at all and simply expose everything.

1 Like

Do you mean overloading via a macro? Stable Rust does not have specialization of Generics yet. If we had this, I would be able to solve all my problems very easily :cry:.

That being said, I should explain that for there are multiple challenges here:

  1. Design a working Interface for a fixed amount of generics where any combination of them can be "deactivated"
  2. Implement a backend (solver) over these generics that truly does nothing if a given generic parameter is not specified (no computational overhead, best would be even not showing up in binary).
  3. Have this setup be maintainable and growable

I believe that I can address the first point with my current approach.
The second point is hopefully :crossed_fingers: optimized away by the compiler but relying on an optimizer to do its magic does not seem like a 100% solution especially since I am expecting to grow my codebase in the future.
The third point will be told by time. However through the ongoing discussion, I have come to the conclusion that feature flags are probably not the correct way to solve this problem since they make it very very hard to optimally debug problems.

You got it! This is exactly what I am doing right now. But as I have explained above, there are still many problems that remain with this approach.

Yes this is exactly what I am doing at the moment. I am simply defining a struct
NoCellularReactions {} and then implementing cellular reactions by doing nothing :slight_smile: Only that in my case my "null object" is different in any case since I am naming the object corresponding to what they are "nulling" (such as NoCellularReactions).

Short explanation why generics are desirable in this case:
Of course performance and flexibility are the main aspects but let me get into more depth.

Example 1 - GPU

Consider a cell which has a fixed amount of reactants inside of it. This can easily described by a fixed size array [f64; N] (or f32 but more uncommon in science). If I wanted to run this part on a GPU, I could do it by replacing the fixed size array with a type that will be sent to the GPU NVarray<f64, N>. Some crates such as arrayfire actually allow users to do this.

Example 2 - Spatially distributed reactions

Consider a case where it actually matters how intracellular reactants are distributed inside the cell. In this case it is easy to update our model by simply moving from a list of reactants to a multi-dimensional array. This way we discretize the space inside the cell and every point can be assigned a value of reactant. Let's say again that we have N reactants. Then we could represent our total intracellular reactants by a type ndarray::Array4<f64> and model transport of our reactants by coupling adjacent spatial boxes to each other but still using the same solving algorithm.

No, I mean method overloading in the common sense; I have several methods, all with the same name, but they each take different arguments. You are (or were) using conditional compilation attributes to emulate that behavior.

But more than that, I still haven't seen any strong argument for using multiple generics over a single generic with a concrete tuple or struct.

This should "just work" if the caller decides the single generic is a single primitive, or a tuple of primitives, or a struct. Leaning on the type system to work it out, instead of hiding defaults and hoping the optimizer knows when it is dead code.

How many types are you anticipating the backend to deal with? If you know all of the types ahead of time, it's just a matter of exhaustively implementing them (perhaps with a macro to reduce repetition). Or making the caller implement it for their own newtype. Either is viable, IIUC.

I don't believe either of these are relevant to the interface. If they are, then your interface needs to make that loud and clear. (One way to do that is with a trait bound on the generic) Again, I don't think this is what you are looking for. But I am pointing out how you could make these details relevant to the interface, if that was a goal.

IMO these are the kinds of details that should be encapsulated rather than leaking into the interface.

I know about overloading in C++ but was not aware that similar concepts are also present in Rust. A silly attempt like so does not work at least:

Digging a bit deeper, I found this blog-post which achieves something very similar.
http://casualhacks.net/blog/2018-03-10/exploring-function-overloading/

However in one thread on this forum (where I took the above link from) some users are discussing the solution and more or less agreeing on that it should probably be avoided.

This line here is probably challenging me the most at this time. I very much appreciate that you explicitly stated it in this way! :+1: :thinking: I will think about a possible implementation as specialization of implementations could be simpler. However in my case I might come across a situation where two combinations of generic parameters can have the same length (amount of generics used) but different meanings ie. T=(Reactions, Interaction, Cycle) and T=(Reactions,Cycle,Contact). I will have to think carefully about these questions since I expect to have more generics for additional functionality in the future.

Maybe I should have also taken more care about the comments previously made by @vague and @kpreid. In hindsight it is probably the same suggestion. As you can tell, my opinions have shifted.

Not knowing the types is part of the fun! :smiley: No seriously ... I do not know them and I do not wish to make any assumptions about them. Being generic will allow me to use this crate even in the future when the demand for certain other aspects of my models will arise. Of course there is no future-proofing but this is the reason why I was writing this crate in the first place.

They are not at all leaking into the Interface :slight_smile: Maybe I was not expressing it clear enough. What I have written in the above post is purely a motivation and the two examples I mentioned are nowhere present in my current Interface and not even used or implemented at this point in time. However the interface is at this time designed in such a way that in the future will allow me to implement this functionality. You can think about these examples as possible use-cases to motivate where I am coming from and why I chose this generic approach rather than explicitly defining types.

Summary (so far)

I want to thank every contributor to this thread for spending the time to help me. :pray: I have gathered lots of information that will help me in my next steps.

Implementation Variants

I have summarized the different implementations which have been proposed so far. Please feel free to correct me if some of the assumptions/contents of the table are wrong. I highly appreciate the input.

Variant Description Advantages Challenges
1 Single-Generic Interface agnostic to the number of generics via generic tuples T=(Reactions, Interaction, ...) Write implementation for different combinations of generic parameters. Can probably be scaled better to higher number of generic parameters and additional combinations without affecting existing code. More verbose and probably more code to write although unclear depending on how well existing code could be reused. To me still unclear how to effectively map unordered tuple of generics to correct component of backend.
2 Multi-Generic Interface which exposes ALL generic parameters and simply assumes some default types T=() that effectively disable this component. Probably harder to scale with increasing number of generics. Probably requires less code to write? Unclear what will be correctly optimized away by the compiler.
3 Features Use #[cfg(feature = ...)] blocks and expose correct amount of generics required. Other components of the framework are not visible if not having activated the feature. Once feature is activated, the simulation requires the associated information. (Clear what is being used and when). Manually control what the optimizer will throw out. Almost certainly very hard to maintain and extremely hard to scale with multiple generic parameters. A pain in the a** to debug.

Roadmap

I am actively using the tool which I am developing at the moment. This means a large rewrite will only take place over the coming weeks or months. Nevertheless, I am perfectly willing to spend considerable amounts of time and figure out how to solve my problem most efficiently. Consider the following steps to take place over a longer period of time.

  1. Test Variant 1 with simple toy-systems (Variant 2 was already proven to be a possible solution).
  2. Test at least Variants 1 and 2 with some (>=2) of the actual traits currently used.
  3. Try to add more generics and components to existing code.
  4. Evaluate which one to use
  5. Start the rewrite! :raised_hands: :partying_face:

Thanks to everyone for your effort. I will keep you updated.

2 Likes

This is definitely a hard problem, which is probably best solved by avoiding it— Instead of accepting generic tuples, define a trait that the user is expected to implement on a custom type (perhaps with the aid of a #[derive(...)] procedural macro).

Were I writing this, I'd probably do something along these lines which combines a little bit of all three of your variants:

  • Define the data interface of every component unconditionally (public-facing types and traits).

  • Restrict the impl blocks of these interfaces behind #[cfg(feature = ...)] so that the code for completely unneeded features is guaranteed to be omitted from the resulting binary.

  • Define an uninhabited type Unused that has a no-op implementation of every component. e.g. pub enum Unused {};. This ensures that it will never actually be constructed.

  • Define a trait that represents the simulation settings as a whole, which represents the single generic parameter that will be passed to Simulation:

    trait SimulationParams {
        type C1: Component1Trait;
        type C2: Component2Trait;
        // ... all other components ...
    
        #[cfg(feature = "component1")]
        c1_params(&self)->Option<&Self::C1> { None }
    
        #[cfg(feature = "component2")]
        c2_params(&self)->Option<&Self::C2> { None }
    
        // and so on ...
    }
    
  • Define a derive macro for SimulationParams so that users can easily define their own simulation parameter types, such that a definition like this:

    #[derive(SimulationParams)]
    struct MyParams(#[SimulationParams::C1] F64CellType);
    

    defines an implementation like this:

    impl SimulationParams for MyParams {
        type C1 = F64CellType;
        type C2 = Unused;
        // ... and so on for all possible components, even the disabled ones ...
    
        fn c1_params(&self)->Option<&Self::C1> { Some(&self.0) }
        // No method definitions for the unused components,
        // as the defaults are correct, if present
    }
    

This scheme has a few nice features:

  • Any component that the user code actually relies on will require the corresponding feature to be enabled, because otherwise the trait method overridden by the derive macro won't exist.
  • Features that are turned on but unused in a particular Simulation are disabled via a method returning None.
  • Moreover, if the derive macro is used to implement SimulationParams, the methods for unused components return Option<&Unused>. This is zero-sized and known to the compiler to always be None, which makes it easy for the optimizer to eliminate conditionals during code generation.
  • Because Unused can never actually be instantiated, all of its trait methods can have a body of unimplemented! without fear of panics.