Bring your own cache to my crate!

Am I thinking about this problem correctly?

I have a crate, that does some complex computation, and in the middle of its work it does some very slow OCR and it wants to offer you the ability to bring your own cache for the this part, so that next time if you ask it to do the work again, and you bring your own cache, it can skip that part.

I could break up the pipeline into tiny little pieces and expose them all, then everyone is free to borrow the various functions of the library and assemble their dream pipeline and add whatever caching they want. Problem solved!

The problem is, there is only one way to assemble the pipeline so it makes no sense to expose (leak) the various internal components as a composable thing.

So I was thinking, I make you give me two closures, one closure that passes you the three parameters that when combined make up the unique key for this cache, and you use it to call your cache and ask if you have it cached already, and a second closure for when you return None; I will pass you the data that you now need to store.

Does this sound reasonable?

I've not really seen this pattern used elsewhere, so I'm wondering if there's a reason for that!

It's cumbersome to share data between multiple closures.

You should design a trait for cache, and let users give you an instance of this trait. You then could have an example implementation of that trait that wraps closures if that's possible.

9 Likes

This is what I was thinking (pseudo code)

let (tx_get, rx_get) = tokio::sync::oneshot();
let (tx_get_return, rx_get_return) = tokio::sync::oneshot();
let (tx_set, rx_set) = tokio::sync::oneshot();
tokio::task::spawn(async {
    if let Ok(key) rx_get.recv().await {
        let cached = cache.retrieve(&key).await.ok(); // closure wants Option<Data>
        tx_get_return.send(cached).umwrap();
        if let Ok(data) = rx_set.recv().await {
            cache.store(&key, data).await.ok(); // best effort, doesn't matter if fails.
        }
    }
});
let foo = tokio::task::spawn_blocking(|| {
    Foo::builder()
        .with_cache(
            |a, b, c| {
                let key = params_to_key(a, b, c);
                tx_get.blocking_send(key).unwrap();
                rx_get_return.blocking_recv().ok()
            },
            |data| {
                tx_set.blocking_send(data).ok();
            }
        )
        .build()
}).await??;

oneshot channel implies there's always one thing to retrieve from cache?

Your use of particular channel implementation lets you use closures without shared context, so you're designing the API for a most flexible case that doesn't involve any loans, shared mutability, non-Send objects.

But if you wanted to cache in e.g. &mut HashMap, you couldn't in two closures. Closures also can't lend data they own, so an owned HashMap wouldn't work either without cloning the result.
If you had to use some kind of database connection handle, you'd need to clone it or wrap an arc/mutex around it.

oneshot channel implies there's always one thing to retrieve from cache?

Yeah, it would be one cache access per request to the FooBuilder.

Interesting. Does your proposed solution solve for both cases though? (I.e my case is the only case I really care about, where I have two async fns calling into an external cache)

Could you show me some code regarding this trait based approach?

For example:

pub trait Cache {
   fn get(&mut self, key: …) -> Option<…>;
   fn set(&mut self, key: …, value: …);
}

and have a C: Cache bound on your Foo. If you don't want a trait bound, then take Box<dyn Cache>. If you need cache only for a single function call, then &mut dyn Cache will work too.

And you could have something like:

impl<F1, F2> Cache for (F1, F2) where F1: FnMut(…), F2: FnMut(…) { … }

to allow passing a pair of closures for convenience.

4 Likes

But if the caller still has to provide 2 closures, then I can make Cache private, and then I'm wondering why I need it at all?

It sounds like what you are proposing is simply who has to hold onto the key between get and set? (and I can hold onto it without defining a trait)?

Because the caller may be unable to use closures if they have shared state, and then they will have to use a struct and impl Cache for TheirStruct.

3 Likes

I don't understand, why can they not have shared state across two closures?

This compiles:

fn main() {
    let shared = String::from("i am shared");
    let a = || {
        println!("{}", &shared);
    };
    let b = || {
        println!("{}", &shared);
    };
    a();
    b();
}

ps. is there a standard keystroke combination for I'd like to use this more! :smiley:

Try this with &mut. It's solvable with interior mutability, but that's unnecessary overhead.

4 Likes

Right of course. Wow, that's so advanced of you to think of this.

So if you had an in memory cache, lets say literally the std::colletions::HashMap You still wouldn't be able to impl Cache for it because of the orphan rule. So you would newtype your HashMap and then impl for your newtype? construct your newtype, use it, then unwrap the newtype when you are done with it?

Yes, implementor would need their own struct to implement Cache. You can impl it for HashMap yourself in your crate if you want to provide a default like that.

Other reasons to prefer a trait:

  • struct implementations have a concrete type names, so someone can have Config { cache: MyCacheImpl }, but you can't name actual closure types. They can only be used via generics or dyn (that will eventually improve in Rust as it expands impl Trait syntax).

  • You can extend the trait with more methods. It can be backwards-compatible way if you provide default impls.

1 Like

... over what? I'm trying to identify what the pattern is here where I can recognise that I should adopt a trait to solve the problem.

There are no hard rules, but I think if there's a gray area, you should lean towards a trait. You can make a blanket implementation of the trait for closures, but OTOH if the API forces closures, there's no way to avoid them.

  • If there's more than one closure, definitely a trait, because a trait can have &mut self, and a pair of closures can't.

  • If it would make sense to return borrowed data (-> &X) then trait, because closures are not allowed to return data they own, but a struct could own data and lend it.

  • If you need future extensibility, trait.

Closures are good for ad-hoc code. For example map takes a closure, because it's mostly a simple function with convenient syntax, it won't need extensibility.

OTOH string searching functions take Pattern trait, because they support closures |c: char|, but also &[char] slices or strings.

With caching I imagine people may want to have sophisticated implementations, more than some one-off one-liner at the call site. So foo.with_cache(MyFancyCache::new(caching_config)) can make sense.

3 Likes

To illustrate a bit the topic @kornel mentions (at least the way I see it), let's consider a simpler API than caching (as in, I think there are two orthogonal questions at play here: "how should this caching API look like" and "an API solely based on closures struggles with shared state"):

API (silly) ideä

If(::rand::random())
    .then(|| 42)
    .else_(|| 27);

Problem

let mut x = 0;
If(::rand::random())
    .then(|| x = 42) // Error: `x` borrowed here,
    .else_(|| x = 27); // but `x` was (also) `&mut`-borrowed here!
x

Solution

let mut x = 0;
If(::rand::random())
    .with_state(&mut x)
    .then(|x| *x = 42)
    .else_(|x| *x = 27);
x

which does work


So, back to the OP, you can make a multi-closure-based API work, but at the cost of this slightly cumbersome shared_state extra parameter.

The common suggestion here is then to let people impl some Trait instead.

trait Ite {
    type Output;
    fn then(self) -> Self::Output;
    fn else_(self) -> Self::Output;
}

fn ite<R>(condition: bool, bodies: impl Ite<Output = R>) -> R {
    if condition { bodies.then(condition) } else { bodies.else_(condition) }
}

Since you can always offer a closure-based API with some helper type/function à la:

fn from_fns<State, R>(
    state: State,
    then: impl FnOnce(State) -> R,
    else_: impl FnOnce(State) -> R,
) -> impl Ite<Output = R>
{
    struct AdHocIte<S, T, E>(S, T, E);
    impl<R, E, S, T> Ite for for AdhocIte<S, T, E>
    where    
        T : FnOnce(S) -> R,
        E : FnOnce(S) -> R,
    {
        type Output = R;
        fn then(self) -> R { self.1(self.0) }
        fn else_(self) -> R { self.2(self.0) }
    }
    AdHocIte(state, then, else_)
}

Now, I hope that the if-then-else silly API has nonetheless proved to be a decent enough example to illustrate my point:

  • you can use a multi-closure API, but then make sure to offer a way to funnel through some shared-state to handle the disjunction

  • and/or a trait-based API, with nonetheless a from-closure based constructor.


Back to a cache

In the case of designing a cache, this will often involve some kind of HashMap structure, maybe with some extra tweaks.

In that case, the advantage of the trait is that we already have a canonical type for which to implement the trait:

impl<…> Cache for MyHashMapBasedCache<…> {
    fn on_…() { … }
    fn on_…() { … }
}
5 Likes

beatuiful explination. Thank you!!

I'm really struggling to impl this...

One of the problems with my proposed simple dual closure problem, is that actually the closures want to be called *with a unique key) for every page in the pdf. So for every page, i need to start the flow again, where we generate a key, and use that to retrieve/store.