Hello.
I'm experimenting with certain aspects of Rust to choose the best API for a bioinformatics library. I've stopped at the following interface to work with sequences:
// the arguments are alphabets and transcription tables, see below
let (DNA, RNA) = make_nuc_pair(...);
let dna: Sequence<'_> = DNA("ATGC");
let rna: Sequence<'_> = dna.transcribe();
Here, DNA
and RNA
are factories that create instances of Sequence
. In conjunction with a sequence
string, it is obligatory for each Sequence
to be associated with an alphabet
and also with the information on how to transcribe each Sequence
into its pair, i.e., DNA to RNA and vice versa. In order to hide these details, I'm using factories that take &str
as an input and return a Sequence
object with all fields being set according to the recipe that was established during the creation of these factories when make_nuc_pair
was called.
So how does the Sequence
look like?
struct Sequence<'a>
{
sequence: String,
alphabet: &'a Alphabet,
tsc_data: &'a TranscriptionData<'a>,
}
struct Alphabet { /* details are not important */ }
The transcription data field tsc_data
refers to the following structure
struct TranscriptionData<'a>
{
factory : Option<Box<dyn Fn(String) -> Sequence<'a>>>,
table : &'a TranscriptionTable,
}
struct TranscriptionTable { /* details are not important */ }
Namely, when .transcribe()
method is called the sequence
is processed using the table
and then passed to factory
to create a new Sequence
object that is a result of transcription
impl<'a> Sequence<'a>
{
fn new(sequence: &str, alphbaet: &'a Alphabet, tsc_data: &'a TranscriptionData<'a>) -> Self
{
Sequence { sequence : sequence.to_string(), alphabet : alphbaet, tsc_data : tsc_data }
}
fn transcribe(&self) -> Sequence<'a>
{
let sequence = self.tsc_data.table.transcribe(&self.sequence);
(self.tsc_data.factory.unwrap())(&sequence)
}
}
And there is a problem with this approach. Namely, a sequence with a DNA alphabet should refer to a corresponding RNA factory and vice versa. This a classical headache in Rust.
Here is a (non-working) sketch of logic that initializes factories:
pub fn make_nuc_pair<'a>(dna_alphabet : &'a Alphabet,
rna_alphabet : &'a Alphabet,
dna_to_rna : &'a TranscriptionTable,
rna_to_dna : &'a TranscriptionTable) -> (impl Fn(String) -> Sequence<'a>, impl Fn(String) -> Sequence<'static>)
{
let mut tsc_data_dna = TranscriptionData { factory : None, table : dna_to_rna };
let mut tsc_data_rna = TranscriptionData { factory : None, table : rna_to_dna };
let DNA = Box::new(move |seq: String| Sequence::new(&seq, dna_alphabet, &tsc_data_dna));
let RNA = Box::new(move |seq: String| Sequence::new(&seq, rna_alphabet, &tsc_data_rna));
tsc_data_dna.factory = Some(RNA); // this is the place where
tsc_data_rna.factory = Some(DNA); // everything breaks
(RNA, DNA)
}
So the idea is to first create TranscriptionData
objects with factory
set to None
, then create the corresponding factories (they are closures), and finally to cross-reference them. This works in Python, but as we know, not in safe Rust.
What I would like to have is some way (most likely, with the unsafe
block) to cross-reference these factories such that the resulting structure is (1) thread safe (Sequence
objects are expected to be sent between threads) and (2) is not a weird Option<Arc<RefCell<&'a ...
type of thing. I'm OK to work with raw pointers.
- What is the range of possible solutions and their pros and cons?
- Is there a way to change closures to functions?
- Is there a way to get rid of
Option
and initialize with null pointer?
P.S. All of this is needed because I want to work with dynamic (i.e., created at run-time) alphabets and translation tables.