Laugh at a silly incorrect memoize function (then tell me how it even compiled)

I was up late last night writing nonsense (category theory) memoize functions apparently, as I have no idea what this is doing, or for that matter, how this even compiled.

So as an exercise, I'm trying to understand what drunk-on-category-theory brain wanted to do.

What the function was supposed to do: (from ch 2 exercise 1)

Define a higher-order function (or a function object) memoize in your favorite language. This function takes a pure function f as an argument and returns a function that behaves almost exactly like f , except that it only calls the original function once for every argument, stores the result internally, and subsequently returns this stored result every time it’s called with the same argument. You can tell the memoized function from the original by watching its performance. For instance, try to memoize a function that takes a long time to evaluate. You’ll have to wait for the result the first time you call it, but on subsequent calls, with the same argument, you should get the result immediately.

Questions I have about the code below:

  1. Is that even possible? That seems like I'm telling a function to keep a stateful hash-map alive that lives somewhere in outer-space, invisible to all but Turing Machines. Which I tried to do below, with mixed results.
  2. Return type: Why should the function return FnMut? Because it appears the closure I'm returning is accessing and mutating the mut cache, though that cache will cease to exist ( I think ) after the memoize function returns.
  3. Why move? I think that's because I need to capture the mut cache and mutate it, but I'm very not certain, and I kept it because it made the compiler happy.
pub mod ch2{
   // This compiles and like a dog in front of a keyboard, I have no idea why
    pub fn memoize<A, B, F>(f: F) -> impl FnMut(A) -> B
    where
        F: Fn(A) -> B,
        A: std::cmp::Eq + std::hash::Hash + Copy,
        B: Copy,
    {
        let mut cache: HashMap<A, B> = HashMap::new();
        // closure mutates state (seems true), must have trait FnMut 
        // move is needed so that closure explicitly captures cache (that true?)
        move |x: A| match cache.contains_key(&x) {
            false => cache.insert(x, f(x)).unwrap(), // need copy here for A
            true => *cache.get(&x).unwrap(),         // copy B
        }
    }
    // But it sure doesn't pass this test
    #[cfg(test)]
    pub mod tests_2 {
        use crate::ch2::memoize;
        #[test]
        fn test_memoize() {
            let g = |x| x + 1;
            let mut f = memoize(|x| x + 1);
            println!("{},{}", f(4), g(4));
        }
    }
}
1 Like

Your very close, just need to read the insert doc and adjust.

The hashmap lives inside the closure, which you assign to f variable.

Once you get it working I would suggest changing to B: Clone

You can avoid the second hashmap lookup on every call (contains_key + insert, or contains_key + get) and the panicking unwrap codepaths by using:

let mut cache = HashMap::new();
move |a| *cache.entry(a).or_insert_with(|| f(a))

This ends up hashing a only once rather than twice.

4 Likes

Funnily enough your three questions are so related that one single answer should be able to address these three concerns.

First of all, I'll let you read this great blog post explaining closure unsugaring:

Now, let's look at your memoize function signature:

pub
fn memoize<A, B, F> (f: F) -> impl FnMut(A) -> B
where
    F : Fn(A) -> B,
    // ...

For all A, B, F where F : Fn(A) -> B, memoize::<A, B, F> is a function that takes an element of type F, i.e., a (Fn) closure / callable (by &) thingy, and then returns something that implements the FnMut(A) -> B interface trait, i.e., it returns something that is callable by &mut.

Now, the key thing here is regarding something:

  • any struct / state can be a closure; being a closure just means that you implement the special callable API / interface / Fn__ traits, so that you can use the (...) sugar to "call it" / invoke that callable API;

    • For instance, when the struct / state that implements the callable API is empty, we say that the closure captures no state, and is thus equivalent to a good old function.
  • in your example, by using the impl <trait / interface> sugar, you are saying that there exists some type that implements the required API (here, FnMut(A) -> B), and that such (existential) type is the return type of your function. You don't know exactly what it is, but if you tried to replace that return type with, for instance, the type of a function (pointer): -> fn(A) -> B, you would see that the program does not compile.

So by this point if you haven't read the blog post and just all this verbiage of mine I am not sure things are clearer. That's why I find it useful to stop using the closure traits, which as the name of the blog posts indicates, are kind of "too magic" to understand.

So let's imagine a world without Fn__ traits. Can we implement closures? Well yes of course, where we have "objects" we have closures, and vice versa. Since a closure is "just" a callable API, let's define our own callable API:

trait Callable<Arg> {
    type Ret;

    fn call (&mut self, arg: Arg) -> Self::Ret;
}

and ... that's it!

Indeed, if we want to define a basic double(i32) -> i32 Callable, we can just do:

/// First, define our "dummy"/ empty state
struct Doubler {
    // empty!
}

/// Our "dummy" state can be fed a `i32` ...
impl Callable<i32> for Doubler {
    /// ... in which case it returns a `i32` ...
    type Ret = i32;

    /// ... by doing the following computation:
    fn call (&mut self, arg: i32) -> i32
    {
        2 * arg
    }
}

/// Usage example
fn main ()
{
    let mut doubler = Doubler { /* empty initial state */ };
    assert_eq!(doubler.call(4), 8);
}

Now for a more complex example: let's define a closure computing (and storing) cumulative sums! This will require some state!

/// First, define our state: the computed sum up until now
struct CumulativeSum {
    computed_sum: i32,
}

/// Our state can be fed a `i32` ...
impl Callable<i32> for CumulativeSum {
    /// ... in which case it returns a `i32` (a snapshot of its own internal state)
    type Ret = i32;

    /// ... by doing the following computation:
    fn call (&mut self, arg: i32) -> i32
    {
        self.computed_sum // access the state
            += arg // which we can read _and mutate_ thanks to the `&mut self` signature
        ;
        self.computed_sum // yield the "snapshot"
    }
}

/// Usage example
fn main ()
{
    let mut cumulative_sum = CumulativeSum {
        // Initial state: a sum of 0
        computed_sum: 0,
    };
    assert_eq!(cumulative_sum.call(4), 4);
    assert_eq!(cumulative_sum.call(38), 42);
}

We can even go further and make our callables avoid owning stuff, and instead just refer to it:

/// First, define our state: the computed sum up until now
struct CumulativeSumByMutRef<'borrow> {
    at_computed_sum: &'borrow mut i32,
}

/// Our state can be fed a `i32` ...
impl Callable<i32> for CumulativeSumByMutRef<'_> {
    /// ... in which case it returns a `i32` (a snapshot of its own internal state)
    type Ret = i32;

    /// ... by doing the following computation:
    fn call (&mut self, arg: i32) -> i32
    {
        *self.at_computed_sum // access the state (with a dereference)
            += arg // which we can read _and mutate_ thanks to the `&mut self` signature
        ;
        *self.at_computed_sum // yield the "snapshot"
    }
}

/// Usage example
fn main ()
{
    let mut computed_sum = 0; // some local state
    let mut cumulative_sum = CumulativeSumByMutRef {
        // Initial state: a sum of 0
        at_computed_sum: &mut computed_sum,
    };
    assert_eq!(cumulative_sum.call(4), 4);
    assert_eq!(cumulative_sum.call(38), 42);
    // drop(cumulative_sum); // (implicit) we are done with our (temporary) closure
    assert_eq!(computed_sum, 42); // we can still access the referred to state
    // return cumulative_sum; // Error, cannot return a closure referring to a local
}

As you can see, I have commented out some "error" that would happend if we attempted to return a closure that refers to some local state; I think this is the behavior you had in mind / in your mental model. It just turns out that, as shown in the previous examples, a callable is capable of owning its state, in which case there is no problem returning it.

If we take your example, we can now explicitely rewrite it using our non-magic Callable interface:

use ::std::{
    collections::HashMap,
    hash::Hash,
    ops::Not,
};

pub
trait Callable<Arg> {
    type Ret;

    fn call (&mut self, arg: Arg) -> Self::Ret;
}

pub
fn memoize<A, B, F> (f: F) -> impl Callable<A, Ret = B>
where
    F : Callable<A, Ret = B>,
    A : Eq + Hash + Copy,
    B : Copy,
{
    let cache: HashMap<A, B> = HashMap::new();
    return ReturnedState { cache, f }; // returned state takes ownership of both the cache and f
    /// where:
    struct ReturnedState<A, B, F> {
        cache: HashMap<A, B>, // owns it!
        f: F,
    }
    /// and where our ReturnedState is callable!
    impl<A, B, F> Callable<A> for ReturnedState<A, B, F>
    where
        F : Callable<A, Ret = B>,
        A : Eq + Hash + Copy,
        B : Copy,
    {
        type Ret = B;
        fn call (self: &mut Self, x: A) -> B
        {
            // sugar: strip `self` within this function body
            let &mut Self { ref mut cache, ref mut f } = self;

            if cache.contains_key(&x).not() {
                cache.insert(x, f.call(x));
            }
            *cache.get(&x).unwrap()         // copy B
        }
    }

}
// But it sure doesn't pass this test
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_memoize() {
        let mut g = {
            #[derive(Clone, Copy)]
            struct G;
            impl Callable<i32> for G {
                type Ret = i32;
                fn call (&mut self, x: i32) -> i32
                {
                    x + 1
                }
            }
            G
        };
        
        let mut f = memoize(g);
        println!("{},{}", f.call(4), g.call(4));
    }
}
  • Playground

  • (I have fixed a bug in your function in the meantime: .insert() does not yield the current value but the previous one, so that .unwrap() in your false branch of the code was doomed to fail)

As you can see, it works, but you realize how cumbersome (e.g., look at the definition of g in the test) using this trait explicitely is; hence the closure sugar:

|<args>| -> Ret {
    <body which may refer to local variables>
}

is sugar for the compiler defining an anonymous struct that implements the necessary Callable trait(s) (Fn : FnMut : FnOnce) and that captures the local variables ("usually" by reference, which you can override and tell it to force an owning capture by annotating the closure with the move keyword; this was, for instance, necessary in your example to make the closure own the cache (making it live within it) rather than just refer to it (in which case the cache would die when the function returned and the closure would "dangle", causing the compilation error)).

  • For instance, Callable<Arg, type Ret> in my example was FnMut(Arg) -> Ret in closure language; FnMut because of the &mut self (which "allowed mutation") in the fn call method: Fn uses a &self receiver whereas FnOnce uses self. c.f. the blog post for a more detailed presentation of these three traits.

So, taking the double and cumulative_sum examples, in closure language they would become:

let double = |x| 2 * x;
let cumulative_sum = {
    let mut computed_sum = 0; // state
    move |x| {
        computed_sum += x;
        computed_sum
    }
};

and

let mut computed_sum = 0;
let cumulative_sum = |x| {
    computed_sum += x;
    computed_sum
};
assert_eq!(cumulative_sum(4), 4);
assert_eq!(cumulative_sum(38), 42);
// drop(cumulative_sum);
assert_eq!(computed_sum, 42);
12 Likes

A response so thorough, I'm astonished you didn't try to sell me anything more than a blog post read. Thanks @Yandros!

4 Likes