How to elegantly implement a cached generator?

I want to implement a cached generator. Something like this

struct CachedGen<T: Iterator<Item = u32>> {
    generator: T,
    cache: Vec<u32>
}

When you are trying to get the nth element, it will look up the cache, if it is there then return, otherwise call generator until the nth element and push all the elements into cache so that it can be reused later. The problem with that this cached generator will be shared a lot but I cannot duplicate &mut. One option I tried is to use Rc<RefCell<CachedGen>>, but it makes my code very ugly and more importantly it puts too much overhead to my code since this generator is very hot. My program is single-threaded and I can guarantee that the computation dependency is a DAG, is it reasonable for me to use unsafe code here? Or if there is any better approach.

Thank you!

EDIT: details
Let's say we have a graph with u32 node ID.

struct Graph<T> {
    nodes: Vec<Node<T>>
}
struct Node<T> {
    content: T,
    outgoing: Vec<u32>,
    incoming: Vec<u32>
}

It is an adjacent list for each node, and the graph is a DAG(I have run time check to make sure it is a DAG). Now that I have a DP algorithm, that will calculate a list of u32 for each node in the graph. WLOG, let's assume that the graph is topologically sorted with respect to the node id. So to calculate the list for node x, it might need the result from nodes with id less than x

// I want to get this
DP_result: HashMap<u32, Vec<u32>>

// However, since I want to calculate these things lazily, what I really want is
DP_result_lazy: HashMap<u32, CachedGen>

// But wait, the `CachedGen` in this hash map is also referring to other
// `CachedGen`  in the HashMap, I cannot have self-reference because 
// it will lock everything up so that I cannot add any thing to the hashmap
// Okay, fine, let's do this

DP_result_lazy: Rc<RefCell<HashMap<u32, CachedGen>>>

// Oh damn, if I want to get the mutable reference of CachedGen, I need to
// borrow_mut from DP_result_lazy, but this will panic at runtime as I will have
// more than one coroutine borrowing DP_result_lazy at the same time.
// Okay, let's have more fine-grained share

DP_result_lazy: Rc<RefCell<HashMap<u32, RefCell<CachedGen>>>>

// I should be good, right? 
// error[E0626]: borrow may still be in use when coroutine yields 

:sob: :sob: :sob: :sob: :sob:
So what I ended up with is

DP_result_lazy: Rc<RefCell<HashMap<u32, Rc<RefCell<CachedGen>>>>>

The super simplified version of the Coroutine

// For each node_id
let DP_result_lazy_clone = DP_result_lazy.clone()
let coroutine = #[coroutine] || move {
    let incomings = graph.get_incomings(node_id);
    for pred in incomings {
        // Super stupid borrowing.
        DP_result_lazy_clone.as_ref().borrow().get(&pred).unwrap().as_ref()......
        // Some complicated calculation
        yield result;
    }
}
DP_result_lazy.as_ref().borrow_mut().insert(node_id, Rc::new(RefCell::new(CachedGen::new(coroutine))));

The overhead of Rc and RefCell are very low (likely unmeasurable) when you're cloning the Rc or borrowing from the RefCell. And the allocation overhead of Rc is a one time cost per generator. Are you sure this is too much overhead?

1 Like

I think you are right. What actually happened is the generator is actually Coroutines, which doesn't allow borrowed value when yielding. If I am using Rc and RefCell I have to make a lot of useless reborrow to make the compiler happy, maybe I should have used the iterator.

If I understand, you may have the same sorts of problems with Iterator. If you post the code, people may have suggestions.

1 Like

I wish I could. But it is very long and the logic is kind of confusing(We spent 5 pages to discuss this algorithm in the paper), and the paper is still under submission. In any case thank you for your advise.

No need for Rc. This works:

struct CachedGen<T: Iterator<Item = u32>>(RefCell<CachedGenImpl<T>>);

struct CachedGenImpl<T: Iterator<Item = u32>> {
    generator: T,
    cache: Vec<u32>
}

And now you can share references to CachedGen.

2 Likes

If your cache contains actual u32 or small copy types, then use the Cell<u32> wrapper type. It supports single-threaded mutation through shared reference. It doesn't do any refcounting or locking. It's basically equivalent to what C does by default when mutating data.

Thank you, the other problem is that these generators are used in Coroutines, which doesn't allow borrowing when yielding. So I will have to wrap Rc around it to share the ownership.

What data exactly is being borrowed? The u32 can be copied and returned by value.


Oh, do you mean the coroutine can't borrow the &CachedGen?

No, this is a common misconception. Shared mutability is always problematic, not only in the presence of threading.

1 Like

It's not always problematic, given that things like RefCell exist that do soundly provide such functionality.

Shared mutability is simpler to deal with in the presence of a single thread, as evidenced by the example of RefCell that only works within a single thread.

It's certainly possible to optimize away the runtime checks performed by RefCell by using unsafe code (UnsafeCell in particular), and to do it in a sound way.

2 Likes

Exactly.

Here is roughly what I am doing. I am performing a DP on a DAG. I am calculating a list for each node, the result of each node depends on the result of its predecessors.
One straightforward approach is to perform a topological sort and calculate the list for each node in the order. This is perfectly fine and I have implemented it already.
However, I am only interested in the result of the last node, and to get the result of the last node I do not necessarily need to calculate the list for every node. So I want to perform a lazy evaluation, specifically, associate each node with one of the CachedGen(we need to cache it for DP). Since the graph is a DAG not a tree, the CachedGen will be shared by more than one node. Then the most straightforward approach is to use Rc<RefCell>. Which is very tedious to write and the restrictions of the coroutines make it even worse. That's the reason why I am asking if it is acceptable to use unsafe here, the graph is guaranteed to be a DAG.

1 Like

People probably would like to see exactly how you're using unsafe in order to answer. The question is whether it can be made sound, in a reasonable/verifiable way. Or are you asking for ideas?

Another thing to consider, if you're going to use unsafe somehow here, is whether unsafe could be used instead to work around the borrowing restriction (that the coroutine can't borrow &CachedGen).

In either case, it's all about the details I think, since one has to be so careful with unsafe.

1 Like

I would say I am asking for advice in this case. Since I like neither my Rc RefCell implementation nor my unsafe implementation.

I will add the detail here. Need to wrap up my mind on how to demonstrate it.

Let's say we have a graph with u32 node ID.

struct Graph<T> {
    nodes: Vec<Node<T>>
}
struct Node<T> {
    content: T,
    outgoing: Vec<u32>,
    incoming: Vec<u32>
}

It is an adjacent list for each node, and the graph is a DAG(I have run time check to make sure it is a DAG). Now that I have a DP algorithm, that will calculate a list of u32 for each node in the graph. WLOG, let's assume that the graph is topologically sorted with respect to the node id. So to calculate the list for node x, it might need the result from nodes with id less than x

// I want to get this
DP_result: HashMap<u32, Vec<u32>>

// However, since I want to calculate these things lazily, what I really want is
DP_result_lazy: HashMap<u32, CachedGen>

// But wait, the `CachedGen` in this hash map is also referring to other
// `CachedGen`  in the HashMap, I cannot have self-reference because 
// it will lock everything up so that I cannot add any thing to the hashmap
// Okay, fine, let's do this

DP_result_lazy: Rc<RefCell<HashMap<u32, CachedGen>>>

// Oh damn, if I want to get the mutable reference of CachedGen, I need to
// borrow_mut from DP_result_lazy, but this will panic at runtime as I will have
// more than one coroutine borrowing DP_result_lazy at the same time.
// Okay, let's have more fine-grained share

DP_result_lazy: Rc<RefCell<HashMap<u32, RefCell<CachedGen>>>>

// I should be good, right? 
// error[E0626]: borrow may still be in use when coroutine yields 

:sob: :sob: :sob: :sob: :sob:
So what I ended up with is

DP_result_lazy: Rc<RefCell<HashMap<u32, Rc<RefCell<CachedGen>>>>>

The super simplified version of the Coroutine

// For each node_id
let DP_result_lazy_clone = DP_result_lazy.clone()
let coroutine = #[coroutine] || move {
    let incomings = graph.get_incomings(node_id);
    for pred in incomings {
        // Super stupid borrowing.
        DP_result_lazy_clone.as_ref().borrow().get(&pred).unwrap().as_ref()......
        // Some complicated calculation
        yield result;
    }
}
DP_result_lazy.as_ref().borrow_mut().insert(node_id, Rc::new(RefCell::new(CachedGen::new(coroutine))));
1 Like

Either you are maliciously twisting my words, or you are deeply confused about the meanings of "problematic" and "unsound". These two are not the same.

RefCell is implemented using unsafe, by the very people who designed the language, using a primitive (UnsafeCell) that directly nags the compiler in the direction of not assuming that shouldn't be assumed.

This is no task for the beginner who is still at the stage of confusing parallelism with concurrency.

My point is: people shouldn't reach for unsafe as their first reaction to encountering contention with Rust's memory model, because they will inevitably get it wrong if they don't understand the finer details.

I'm not saying that it's impossible – it very obviously is, but it's not what everyone should do when getting a borrowck error.

This is trivially false and irrelevant. It hasn't got anything to do with "simplicity". RefCell has a direct multi-threaded equivalent: RwLock. However, a single-threaded equivalent needs to exist so that there's no superfluous synchronization cost to be paid when it's statically unnecessary.

In fact, the very existence of, and need for, a single-threaded shared mutability abstraction means that shared mutability is hard to get right even in the absence of parallelism.

1 Like

Considering you need to reference the HashMap and the CachedGen reentrantly as you described, that type doesn't look so bad to me. If you wrote the same thing in Java it would just be

HashMap<u32, CachedGen>

since all objects in Java are effectively wrapped in Rc<RefCell<...>>. But with Rust you have to do all the mechanics of wrapping and dereferencing manually due to lack of a GC.

I know I'm not helping at all, I'm only noting that what you ended up with seems like an appropriate representation of the data structure in Rust. Whether this can be simplified with unsafe is the next question.

Thank you, Rc and RefCell alone aren't too bad. But if you use them in coroutine, things get out of control. For instance, theoretically, each coroutine only need to borrow the HashMap once, since we will not add or remove anything, but you can't have a variable of Ref in the coroutine because it is a borrow. So every time I want to look up the HashMap I need to borrow, and RefCell will perform completely useless runtime check since no coroutine will borrow_mut. Same goes with all the CachedGen.

The unsafe code I tried is to replace all the Rc RefCell with raw pointers, but apparently I am not following the stacked borrow rule and miri yells at me. Nevertheless it does increase my performance a lot and passed all the test cases. I guess that's all I can do.

Your HashMap values are Rc<RefCell<CachedGen>> but you're inserting a Rc<RefCell<some Coroutine>>, so it's not clear to me what's going on there.

But as an unrelated note: since your HashMap keys seem to be consecutive integers, you can use a Vec instead.

Oh, the CachedGen is actually taking a coroutine as a generator. So I am inserting CachedGen that wraps the coroutine. My bad, already fixed in the post.

The node_idx in practice is not consecutive. And even if I use Vec I still have this reborrow issue, no?

Yes but since you're trying to optimize performance this seemed like an easier and bigger win compared to the runtime checks in RefCell.

If the indices aren't consecutive, you could also make them consecutive before the whole big calculation starts.

I would try to avoid using coroutines for this at all, and then maybe you can simply have a Vec<Vec<u32>>, and potentially much better performance.