Trait Objects Taking Boxed Self as Argument for Machine Learning Models

Hi, I currently work on implementing some machine learning models in Rust.

First, I implement a trait like Layer for representing a neural network layer.
This is needed for dynamic layer use and composition.
It should do forward (application of the layer function), backward (calculation the derivative information of the layer), and update its parameter (addition the delta of the parameter into self).
The second and third feature hit the limitation of the trait object "The trait cannot have any method taking Self as arguments" (Trait Objects - The Rust Programming Language).
In other word, for example, it is impossible to add two trait objects.

The following is an example containing two problematic parts:

pub trait Layer<T> {
    /// Returns the kind of the layer (like `Linear`, `Conv2D`, ...).
    fn kind(&self) -> &str;
    /// Returns the dimension of the input vector.
    fn dim_input(&self) -> usize;
    /// Returns the dimension of the predict vector.
    fn dim_predict(&self) -> usize;
    /// Adds the layer gradient scaled by beta to self. (Problematic Part 1)
    /// dlayer is the one returned by `backward`.
    fn add_scaled_dlayer(&mut self, dlayer: &Box<Any>, beta: T);

    /// Calculates the forward step.
    fn forward(&self, input: &[T], predict: &mut [T]);
    /// Calculates the backward step. (Problematic Part 2)
    /// dlayer is used for storing the derivative information (the type of it should be same as the layer itself).
    fn backward(&self, input: &[T], dpredict: &[T], dlayer: &mut Box<Self>, dinput: Option<&mut [T]>);
}

Until now, I consider the workaround using Box<Any> instead of Box<Self>, but it made impossible to implement Model trait for containing multiple Layers (This is also needed for dynamic use).
This is because, Model should check the consistency of the dimension of the input and output vectors in each layer, and its add_scaled_dmodel method (analogy to Layer::add_scaled_dlayer) must call add_scaled_dlayer method of each layer.
If downcasting Box<Any> into Box<Self> is possible, then this workaround would resolve this problem, but it seems to be impossible also...

Furthermore, using Box<Layer<T>> instead of Box<Self> makes impossible to extract Self<T> from Box<Layer<T>>...
For example, this is needed for adding the scaled dlayer into self in add_scaled_dlayer method.

Some machine learning libraries use the vector type like Vec everywhere for circumventing this problem (maybe).
For example, see leaf::layer::Layer - Rust.
However, I want to implement more complex layer like recurrent units, so serializing the vector makes hard to write code.
In this direction, it would be better to use macro for serializing and extracting the parameter in the vector.
I tried this approach, but it seemed to be impossible to write macro serializing the multiple member object into the vector sequentially (especially in managing the pointer in the vector).

If you have any idea, please tell me your idea.

So you’re looking to use Layer via trait objects in most use cases? Because layer types will be selected at runtime? It’s not very clear why you’re steering in that direction.

Why does adding a scaled layer require the exact same type? Is it for semantic type safety? What if layers exposed the information needed to do the state transitions but keeping the exact type erased?

Thanks you for your reply.

The reason why add_scaled_dlayer requires the exact same type is for semantic type safety.
For example, we should update the parameter vector theta as theta + delta.

If we can write as I said at the question, then we can write the optimizer as follows:

// Model<T> consisting of multiple layers should implement the trait Layer<T>, because itself behaves as a layer.
let model = Model::<f32>::new(Linear::<f32>::new_random(768, 512), Linear::<f32>::new_random(512, 10));
let predict = Vec::with_capacity(model.dim_predict());
model.forward(input, &mut predict);
// Apply the loss function, and its derivative to `doutput`.
// Creates the zero-initialized Model.
let dmodel = Box::new(Model::<f32>::new_zero(model));
model.backward(input, &dpredict, &mut dmodel, None);

However, recently, I think that it is better to use the serialized vector for parameters (i.e. packing the vector or matrix parameter struct into the single vector).
Unfortunately, Rust's macro seems to be little buggy (concat_ident! doesn't work and there is no way to convert the identifier for a const to the one for a method), so it is little hard to implement this approach.

Thus, I want to discuss the approach using Boxed Self for anyone who wants to try this approach.