Mutually dependent factories

Not to mention that OP is definitely misunderstanding something, because the combination of Arc and RefCell is useless. RefCell is not threadsafe, so Arc<RefCell<T>> isn't, either – which makes the Arc no more than an unnecessary synchronization overhead.

Unrelated, but never once in my life did I have a problem to solve and thought, "I know! I'll use a Factory!"

1 Like

It's embarrassing to admit this, but I once did. Same with the other stuff in the Gang of Four book.

1 Like

is not a weird Option<Arc<RefCell<&'a ... type of thing

That's why I mentioned it as "type of thing" (and not the actual solution I had in mind with unsafe refs/pointers) by putting just a random sequence of wrappers to demonstrate the type of construction I don't want.

why is the transcription table from dna to rna and from rna to dna separate if you'll always need them together?

For generality. Once again, it's a REASEARCH project. I'm EXPLORING my options, not LIMITING them.

why is the dna and rna alphabets provided separately from the transcription table?

Sometimes, TranscriptionData will be missing (it'll be Optioned later).

why should transcription data store a closure? Can't you just make factory a method on it? A closure is just going to hide its references/dependencies.

Can you show an example? What I want is something called DNA/RNA that looks like a function, i.e., used like DNA("ATGC"), that generates a Sequence object with a defined Alphabet and TranscriptionData,

I would argue that these kind of "cross-references" are a bad practice in every language, since they make it difficult to reason about what references what due to the presence of multiple cyclic paths between them.

Good luck with graphs. Anyway, in this case, all of this is limited to a single cross-reference. Both alphabets and transcription tables are not mutable. And, in the majority of use cases, they are expected to be instantiated in the beginning of the program.

If you plan to use PyO3 then you'll have to avoid having any lifetime parameter in user facing types, since PyO3 doesn't allow to expose them to python.

For now, I need a working Rust version according to my goals. Later, we'll see. And, as it frequently happens with research, goals may change in the process.

P.S. Why is so hard just to show the best unsafe implementation?

Because unsafe is never the best implementation.

5 Likes

Because using unsafe isn't a magical button to make the problems go away and your problem isn't necessarily one that unsafe could even solve.


Reading through this thread, it sounds like there may be a bit of a design mis-match where the current approach doesn't line up with the way Rust pushes people to write their code. That can feel really frustrating because you want to write your solution one way and the compiler or other Rustaceans keep rejecting you.

You may have heard this referred to as "fighting the borrow checker".

Usually, when I design something in Rust I'll think about what my major concepts are and the way data changes as it flows through. Maybe even sketch out an example of how you'd like end users to use your code and see how different objects need to interact.

This is a massive over-simplification, but generally some guidelines are:

  • Make sure you have a clear ownership story where A owns B, which owns C, and so on
    • Avoid reference cycles or webs because they make the ownership story less clear
    • Avoid putting references in structs because it's easy to tie yourself in knots with lifetimes
    • Avoid interior mutability (RefCell) if you can because it's often a device used to muddy the ownership story
  • Try to write your code in a functional style and stick to immutability
  • Avoid GoF design patterns - they were created for a very different kind of programming language and a literal translation will cause you a lot of frustration
  • unsafe is not the answer - Rust Koans​​​​​

Regarding Alphabet and other things that could potentially be quite large, Arc is the thing you want to reach for because it lets you create a single Alphabet instance and have a bunch of other things reuse it. For more advanced situations, the im crate might be helpful because it contains types which allow you to do structural sharing.

2 Likes

unsafe in rust comes with a tough challenge: don't allow safe code to cause UB. C and C++ don't have that. In C, it's common for a library's doc to list a bunch of esoteric rules that clients must follow or kaboom. C++ started to abandon that approach (the Modern C++ movement), and came up with a bunch of practices which separate ergonomic libraries from non-ergonomic ones. It's tough to fit your intended design into the Modern C++ ergonomic model without adding a couple more layers of complexity to sort out ownership. Once you do that, it's likely possible to reproduce it in Rust without using unsafe.

Even Python has an issue: your model has circular references which cause leaks.

This might be an acceptable solution for you. Rc everything and accept that it will leak.

Python does collect cycles.

2 Likes

I'll use threads, Rc won't work I suspect.

I don't see any leaking in my approach. There will be a dozen of alphabets (and tables) created either "within the library" (i.e., using lazy_static global initialization) or by user at the beginning of their program. All of this will be non-mutable throughout the whole program, with only several cross-links. How can it leak?

Yet it is present in many crates :slight_smile:

Oh, wow! It didn't use to.

Reading through this thread, it sounds like there may be a bit of a design mis-match...

There is no design. It's a research project, namely, design is a flexible thing here. I start with my wishes and see where they lead me. It's not unlikely that it will be not the Rust code but the Python code that will be sacrificed, but first I have to develop several approaches in Rust, including unsafe ones.

or other Rustaceans keep rejecting you

It feels like being a "lost lamb" that doesn't follow "the Bible". However, in many cases, research is about breaking the rules a bit.

Avoid putting references in structs because it's easy to tie yourself in knots with lifetimes

  • I need some kind of a reference/pointer, because I don't want copies/clones of alphabets etc.
  • They will be accessed by multiple threads in a read-only manner.
  • I want cross reference. Without it, there will be no purpose in this whole thread.

For more advanced situations, the im crate might be helpful because it contains types which allow you to do structural sharing.

Haha, with the "prohibited" unsafe under the hood :)))

There's a circular reference between TranscriptionData and TranscriptionData::factory. If you directly Rc or Arc in both directions, the reference count won't fall to 0. You can prevent the leak by choosing which is the owner and making the other a weak ref.

At the end of the program, it'll all be destroyed. Alphabets are created at the beginning of the program in a limited number. Afterwards, they are just referenced.

In this case, and only in this case, then &'static Alphabet is a very good choice (which you can create at compile time using static variables, or at run time using Box::leak()).

(Since the lifetime is specifically 'static, using this kind of reference won't "infect" the rest of your data types with a lifetime parameter.)

4 Likes

To me it feels like you're trying to limit your options though, but forcing a specific API surface. As always, there are tradeoffs.

I don't see Options though.

Once you no longer have the requirement of a "factory" closure, and instead a method, you don't need the closures to store a reference to each other, but instead you can just cheaply recreate the TranscriptionData in it.

Graphs are just a set of nodes and a set of edges. There's no need to represent edges with references, that's just an artificial limitation you may impose on yourself.

Because unsafe is not a magical wand that will fix your program. unsafe is for when the lifetimes of your program are logically correct, but the compiler don't accept them. The lifetimes of your program are not correct to start with, so using unsafe will be plain wrong.

4 Likes

This approach does violate your X<Y<Z>> rule in one spot, TranscriptionData::factory, but it hides it from the user via a method. It also leaks. On the plus side, it has no unsafe and everything should be Sync + Send + 'static so will support threading.

use std::{sync::Arc, sync::OnceLock};

pub struct Sequence {
    sequence: String,
    alphabet: Arc<Alphabet>,
    tsc_data: Arc<TranscriptionData>,
}

impl Sequence {
    fn new(sequence: &str, alphabet: Arc<Alphabet>, tsc_data: Arc<TranscriptionData>) -> Self {
        Sequence {
            sequence: sequence.to_string(),
            alphabet,
            tsc_data,
        }
    }

    pub fn transcribe(&self) -> Sequence {
        let sequence = self.tsc_data.table.transcribe(&self.sequence);
        (self.tsc_data.factory.get().unwrap())(&sequence)
    }
}

pub struct Alphabet {/* details are not important */}

pub struct TranscriptionData {
    factory: OnceLock<Box<dyn Fn(&str) -> Sequence + 'static + Send + Sync>>,
    table: Arc<TranscriptionTable>,
}

impl TranscriptionData {
    // Simplify calling factory
    pub fn factory(&self, sequence: &str) -> Sequence {
        (self.factory.get().unwrap())(sequence)
    }
}

pub struct TranscriptionTable {/* details are not important */}

impl TranscriptionTable {
    fn transcribe(&self, _sequence: &str) -> String {
        todo!();
    }
}

pub fn make_nuc_pair(
    dna_alphabet: Arc<Alphabet>,
    rna_alphabet: Arc<Alphabet>,
    dna_to_rna: Arc<TranscriptionTable>,
    rna_to_dna: Arc<TranscriptionTable>,
) -> (
    impl Fn(&str) -> Sequence + 'static + Send + Sync,
    impl Fn(&str) -> Sequence + 'static + Send + Sync,
) {
    let tsc_data_dna = Arc::new(TranscriptionData {
        factory: Default::default(),
        table: dna_to_rna,
    });
    let tsc_data_rna = Arc::new(TranscriptionData {
        factory: Default::default(),
        table: rna_to_dna,
    });

    let DNA = {
        let dna_alphabet = dna_alphabet.clone();
        let tsc_data_dna = tsc_data_dna.clone();
        Box::new(move |seq: &str| Sequence::new(seq, dna_alphabet.clone(), tsc_data_dna.clone()))
    };
    let RNA = {
        let rna_alphabet = rna_alphabet.clone();
        let tsc_data_rna = tsc_data_rna.clone();
        Box::new(move |seq: &str| Sequence::new(seq, rna_alphabet.clone(), tsc_data_rna.clone()))
    };

    tsc_data_dna.factory.set(RNA.clone()).ok().unwrap();
    tsc_data_rna.factory.set(DNA.clone()).ok().unwrap();

    (RNA, DNA)
}

fn main() {
    let (DNA, RNA) = make_nuc_pair(
        Arc::new(Alphabet {}),
        Arc::new(Alphabet {}),
        Arc::new(TranscriptionTable {}),
        Arc::new(TranscriptionTable {}),
    );

    let dna: Sequence = DNA("ATGC");
    let rna: Sequence = dna.transcribe();

    // Leak happens here because of Arc cycles
}

There is some unnecessary String cloning, including a clone followed soon by a free. I'll leave those as an exercise for the reader.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.