Hi - I originally posted this question on Stack Overflow and was directed here.
I'm writing a family of Markov Chain Monte Carlo (MCMC) algorithms to use as inference methods for a program induction system by translating the MCMC algorithms implemented here in C++ into rust.
The rough dynamic of MCMC is that we have some process which randomly moves around a space of objects and generates a stream of the objects which it discovers. The randomness is essential. Also, this stream never terminates (though you can stop taking objects from the stream). More complex forms of MCMC can be defined as compositions of this basic process. For example, we might create a pool of threads that samples from each thread in a round-robin fashion.
My question: how do I translate these algorithms idiomatically? More specifically, how do I to think about this problem from a rustacean perspective? What are the right tools/techniques to bring to the job?
Here's what I've considered:
-
A direct translation seemed awkward, because the C++ code makes heavy use of callbacks, and that doesn't seem particularly idiomatic compared to making a stream of samples lazily available to the user to use as they see fit.
-
I then thought that iterators might make sense, but:
- The objects I'm sampling are complex data structures representing mini DSLs. It'd be nice to make a reference available that can be cloned if the user decides it's a useful sample. Naively, the chain only keeps around a single sample at a time, and these samples are independent as opposed to being parts of some larger structure, so I'm not entirely sure how to do this. Is there a way?
- These algorithms all need a source of randomness (i.e.
&mut R
whereR: Rng
), and access to a simple control structure for keeping track of various statistics. Baking these directly into whatever struct implementsIterator
then means that I can't compose the algorithms in a way that would share the control structure or source of randomness, right? If not, how would I do this?
-
Generators/Coroutines seem to solve these problems, but seem like a relatively fringe/new area of rust. I began to wonder if I was making things harder than they need to be.
-
I currently provide the struct for each algorithm with a function similar to the following, where
H
is a hypothesis andC
is the control structure:impl<C, H> MCMCChain<C, H> { // ... pub fn next_sample<R: Rng>(&mut self, control: &mut C, rng: &mut R) -> Option<Self> // ... }
Ideally,
next_sample
would returnOption<&H>
, but this is what I have working right now.
How would you implement this?