Registering a generic argument before main

I'm not sure if this is possible, but I figure I'd ask anyway!

What I want is to be able to have a manager issue commands to workers running the same binary. The manager would spin up N worker processes, maybe read a bunch of data, then send the workers commands (through IPC or an RPC or something). The workers should ideally be "dumb" (as soon as they start, they only listen for commands and execute them).

The trouble is registering those commands on the worker processes. Since the workers shouldn't do the "read a bunch of data", the workers also don't get to the part where we send the commands to the workers (so there is no way to record the list of possible commands on the workers). Why not just add a big list at the start of main of all the possible commands? This sort of works, but A) doesn't allow crates to add their own commands, and B) requires explicitly listing out every version of a generic command (which is really weird to require registering a command from another crate just because you use it with a different generic argument).

I'd like to be able to use something like the ctor or linkme crates. However, every way I've tried this I run into "can't use generic parameters from outer function". My idea was something like this:

fn send_command<T: Command>(t: T) {
  #[ctor::ctor]
  fn register() {
    some_static_map_of_dyn_command[T::name()] = Box::new(T);
  }
  // ... Send the data of the command to the worker.
}

This of course fails because we can't define a nested function that uses the outer generic parameter.

Is there any way to do this? It would be really unfortunate if the workers have to get smarter just because there's no way to list out the types used in send_command.

If you have already found these crates then you, of course, saw the warning:

Rust's philosophy is that nothing happens before or after main and this library explicitly subverts that. The code that runs in the ctor and dtor functions should be careful to limit itself to libc functions and code that does not rely on Rust's stdlib services.

For example, using stdout in a dtor function is a guaranteed panic.

In most cases, sys_common::at_exit is a better choice than #[dtor]. Caveat emptor!

Rust explicitly doesn't support thing that you want, but as you saw there are dirty platform-dependent tricks that may be used to, sometimes, kinda-sorta-make it work (by employing hacks outside of Rust's control).

It would be really unfortunate if the workers have to get smarter just because there's no way to list out the types used in send_command.

That's one POV. Another POV is that it would be really unfortunate for anyone who is familiar with Rust to find out that you have added some kind of “life before the main” using some kind of strange backdoor.

I don't know who you works with, though and depending on background of your team using dirty system-dependent hacks may be both good thing or bad thing.

This of course fails because we can't define a nested function that uses the outer generic parameter.

Let me put it that way: Rust developers are trying very hard to ensure that what you are trying to achieve is not possible. But since in C++ and C (GNU/clang version) that's perfectly normal thing it's not clear whether they actually succeeded.

If you would find a way to circumvent that (most likely something related to const calculations) it would be nice to know what have you found. So far the only thing I have found is “life after monomorphization” (which is troubling and exciting, too), but that's not enough since all the effects have to happen during compilation, before your program would be run and, more importantly, before it would be linked.

P.S. What happens if you create a trait with default implementation of some function and attach #[ctor::ctor] to it? It may just work, but I'm too busy to experiment right now...

1 Like

Instead of trying to generate your register() function using monomorphisation, you can make some sort of macro which expands to it.

To make things more ergonomic, you could tie things to some marker trait and make a super simple custom derive.

#[derive(Command)]
struct MyCommand { ... }

// expands to

impl Command for MyCommand {}

const _: () = {
  #[ctor::ctor]
  fn register() {
    some_static_map_of_dyn_command.insert(std::any::type_name::<MyCommand>(), Box::new(MyCommand::default());
  }
};

Then whatever is doing command dispatch just needs to add a requirement that this Command marker trait is implemented.

There are ways to tweak this (e.g. using an attribute macro instead of a custom derive, or requiring them to implement another trait which contains the logic for executing the command, etc.) but that's the general idea.

You still have the problem that commands can't be generic because you need to know all possible ways it'll be used ahead of time. In my opinion, that's something you can only fix through architecture - either commands need to drop their generics, or you need to drop the global registration system and go with something that dispatches commands manually so you get monomorphisation.

This for sure looks like a very convoluted way of achieving what you want. I'd take a step back and question the assumptions first.

Why do the workers need to be "dumb" so that they aren't even allowed to read some data? Why aren't your workers simply functions that take a list of commands to execute? What is it that warrants such a non-conventional architecture? We need way more context and background to be able to help devise a meaningful solution.

You see, "life before main" and especially "generic life before main" basically never comes up if you are writing non-weird code. You are likely approaching this in the wrong way.