Category Theory for Programmers hopeless in Rust?

I'm reading the book now and here's chapter 2:

fn memoize<A, B, F>(f: F) -> F
/* impl Fn(A) -> B */
where
    F: Fn(A) -> B,
{
    lazy_static! {
        static ref MY_MAP: HashMap<A, B> = {
            let mut m = HashMap::new();
        };
    }
    return f;
}

Just compiler complaint after complaint. I checked out the memoize crate and well that's not something I'm going to whip up for a quick end of chapter assignment.

Shall I bite the bullet and just use the crate itself?

It’s called a “challenge” for a reason.

A good starting point however would be to move from Rust’s function trait to some (appropriate) concrete function type. That could be Box<dyn Fn(A) -> B>. Or maybe in better analogy to the typical GCd (and shared) approach of the comparison to Haskell, use Rc<dyn Fn(A) -> B>. (If you want to have it particularly easy, then work with Box<dyn FnMut(A) -> B>, and you can even skip the requirement for interior mutability.) Also as another hint: You shouldn’t need any static variables, and in fact trying to use a static should only make it harder since generic statics don’t exist in Rust.

With that, given interior mutability and non-pureness is easily available in Rust, it might end up even easier to write memoize here, compared to Haskell… where you’d need to “unsafely” encapsulate the side-effects around the memoization process (and possibly worry about thread-safety of the result, too).

Just like in Haskell, you’ll also need to introduce some trait constraints to be able to do the memoization. For the input type you will need Eq and furthermore something like Ord or Hash in order to be able to use an appropriate map type and for the output since Rust cannot support duplicating values of arbitrary types, you’ll need e.g. Clone to be able to do that.

6 Likes

Yeah, you can't do that. Your code would require the compiler to create a different static variable for each combination of relevant type parameters memoize is monomorphized with, which is not something it can do. See this example of reproducing the fundamental problem here:

Maybe you could use the memoize crate or something. I don't know what you're doing this, what you're referring to by "the book" and how your code connects to that. I'd probably just create a struct

struct Memoizer<A, B> {
    m: HashMap<A, B>,
}

impl<A, B> Memoizer<A, B> {
    fn memoize(&mut self, ... ) -> .. .{ { . . }. }. ][. ][s;dad[plsad;pij

But there's several things about your code that make it kinda unclear what you're actually trying to do, so you'd have to elaborate more on what your goals are and then people here would be happy to help you design some idiomatic Rust construct to achieve it

1 Like

Just for fun I tried doing it myself. I ended up simplyfing the challenge quite a bit. No generics and only simple function pointers instead of Fn.

use std::collections::HashMap;
use std::sync::Mutex;
use std::sync::OnceLock;

fn main() {
    let doublem = memoize(double);
    let halvem = memoize(halve);
    dbg!(doublem(10));
    dbg!(halvem(10));
    dbg!(doublem(10));
    dbg!(halvem(10));
}

fn memoize(f: fn(u32) -> u32) -> Box<dyn Fn(u32) -> u32> {
    let g = move |i: u32| {
        static MAP: OnceLock<Mutex<HashMap<(fn(u32) -> u32, u32), u32>>> = OnceLock::new();
        let map = MAP.get_or_init(|| Mutex::new(HashMap::new()));

        *map.lock()
            .unwrap()
            .entry((f, i))
            .or_insert_with_key(|&(_, i)| f(i))
    };
    Box::new(g)
}

fn double(n: u32) -> u32 {
    dbg!("hi from double");
    2 * n
}

fn halve(n: u32) -> u32 {
    dbg!("hi from halve");
    n / 2
}

Playground

1 Like

For reference, this is a dead-simple, no interior mutability, no shared usability (i.e. FnMut, not Fn), no global state (i.e. no static) approach to memoization: (feel free to use the signature and try writing the implementation yourself before looking)

use std::collections::HashMap;
use std::hash::Hash;

fn memoize<A, B>(mut f: impl FnMut(A) -> B) -> impl FnMut(A) -> B
where
    A: Eq + Hash + Clone,
    B: Clone,
{
    let mut map = HashMap::new();
    move |a| map.entry(a).or_insert_with_key(|a| f(a.clone())).clone()
}

Rust Playground


And now I’ve also tried coming up with a thread-safety and Fn supporting version, though that’s significantly more complicated, naturally, as you’ll have to implement the right approach to synchronization (especially if you want to avoid any potential for dead-lock from the code in the Clone implementations or the function call reentrantly (e.g. recursively) accessing the memoized function, or from concurrent usage). Furthermore then, as you can see, assigning the result of that to a global variable was still a bit ugly (and also a bit more inefficient than necessary) since we cannot put unnamed closure types into static values yet. That’s probably the main motivation why a crate like memoize works by offering a macro instead (even though I haven’t checked their actual implementation, so that’s just a guess).

8 Likes

Ahhh, damn. I didn't consider that closures already have state🤦

Thanks for all these suggestions. Let me work through them.

1 Like

So if I add generics and just do a simple passthrough, it won't compile anymore complaining about lifetimes.

fn memoize<A, B, F>(f: F) -> Box<dyn Fn(A) -> B>
/* impl Fn(A) -> B */
where
    F: Fn(A) -> B,
{
    let g = move |i: A| f(i);

    Box::new(g)
}

Ah, as Rc it goes:

fn memoize<A, B>(f: Rc<dyn Fn(A) -> B>) -> Rc<dyn Fn(A) -> B> {
    let g = move |i: A| f(i);

    Rc::new(g)
}

Or actually it doesn't: the parameter type A may not live long enough

Ah lifetimes. Generally if you return a Box<dyn Trait> or Rc<dyn Trait> that has something to do (e.g. contains) some other types, you can introduce a lifetime <'a>, then bound all relevant types on 'a, like A: 'a, B: 'a, and F: 'a[1] and then you put that lifetime into the return type, Box<dyn Trait + 'a>[2] and it should work.


  1. using Rc<dyn Fn(…) -> … + 'a> parameter if the input is using dyn, too like in the code you gave ↩︎

  2. so in this case Box<dyn Fn(A) -> B + 'a> ↩︎

OK. It's bizarre, but finally something that compiles:

fn memoize<'a, A: 'a, B: 'a>(f: Rc<dyn Fn(A) -> B + 'a>) -> Rc<dyn Fn(A) -> B + 'a> {
    let g = move |i: A| f(i);

    Rc::new(g)
}

fn main() {
    let multiply = Rc::new(|x: u32| x * 2);

    let b = memoize(multiply);

    println!("{}", b(2));
}
1 Like

I don't think I could have figured this out without this even. I skimmed pages and pages of lifetimes docs and nowhere found a fully decked out example that would have showed the A: 'a syntax.

This is called a lifetime bound. In my personal experience with Rust literature, lifetime bounds are more often found in the explanations of generic parameters in Rust and not the lifetimes/borrow checker sections, which is probably why you couldn't find it.

2 Likes

Is there a reason you use Rc instead of any other smartpointer? Just curious, not a critique.

I think that might have been my mentioning it above. The reasoning is that Rc<dyn Fn(…) -> …> is somewhat analogous to a “function” type in something like Haskell (shared ownership [garbage collected], supporting closures & capturing, type erased), and the book OP is reading uses lots of Haskell examples, AFAICT. And simultaneously it skips the need to worry about threading (otherwise, you’d need Arc, but also then Send and Sync bounds in multiple places and as part of the type).

3 Likes

The compiler does give actionable suggestions :wink:

use std::rc::Rc;
fn memoize<A, B>(f: Rc<dyn Fn(A) -> B>) -> Rc<dyn Fn(A) -> B> {
    let g = move |i: A| f(i);

    Rc::new(g)
}
error[E0310]: the parameter type `A` may not live long enough
 --> src/lib.rs:5:5
  |
5 |     Rc::new(g)
  |     ^^^^^^^^^^
  |     |
  |     the parameter type `A` must be valid for the static lifetime...
  |     ...so that the type `A` will meet its required lifetime bounds
  |
help: consider adding an explicit lifetime bound
  |
2 | fn memoize<A: 'static, B>(f: Rc<dyn Fn(A) -> B>) -> Rc<dyn Fn(A) -> B> {
  |             +++++++++
[…]
help: consider adding an explicit lifetime bound
  |
2 | fn memoize<A, B: 'static>(f: Rc<dyn Fn(A) -> B>) -> Rc<dyn Fn(A) -> B> {
  |                +++++++++

which is another valid approach, resulting in

use std::rc::Rc;
fn memoize<A: 'static, B: 'static>(f: Rc<dyn Fn(A) -> B>) -> Rc<dyn Fn(A) -> B> {
    let g = move |i: A| f(i);

    Rc::new(g)
}

which works fine. It’s just a bit less general than the version that supports all lifetimes, the above only supports 'a == 'static.

Why is the + 'static not needen in the Rc like it is for 'a?

Rc<dyn Fn(A) -> B> is sugar for Rc<dyn Fn(A) -> B + 'static> which is why it requires A: 'static and B: 'static. If it’s generalized to Rc<dyn Fn(A) -> B + 'a> then it requires A: 'a and B: 'a.

4 Likes

I can't find this information apart from old versions of the rust book. But the chapter has been deleted or moved. Can you point me to some source?

The reference has a section on the default lifetimes of trait objects.

1 Like

I've written more about the elision rules here, which is part of a section about trait object lifetimes more generally.

1 Like