Fibonacci with memoization - how to store hashmap

Interesting observations. Maybe the naive recursive version (as opposed to a more optimized recursive version) and the iterative loop are both the "simplest" versions, but from different perspectives. The iterative one is simplest if you have to make a computer take over from you to calculate Fib(n). It's obvious to make the computer mimic your pencil and paper algorithm.

The naive recursive algorithm is simple in the sense of it's similarity to the mathematical definition of Fibonacci. It's telling the computer what to do, not as much how to do it as the iterative loop.

Whichever way, I'm more used to imperative programming, so if I do exercises, I try to implement it in a functional way just to stretch my brain a bit. It gives me a feeling of accomplishment if I could figure out how to do something I could only imagine doing with loops and mutation without those tools. And I get enough practice in those at my job anyway. Microcontrollers are stateful by nature. So I just play around after work when I get the opportunity :wink:

I think the idea that functional programming tells the computer what to do but not how to do it is an illusion, created by the simplified appearance of the code.

Without hidden optimizations such as tail recursion, the functional approach suffers from a lot of real-world problems, e.g. performance issues, or memory usage.

Edit: Don't get me wrong, I think functional approaches can be useful. But they certainly aren't the be-all end-all of problem solving. Like any tool, there are jobs they simply aren't good at.

I think you are right.

Kids in school don't want to be blinded with a recursive definition of fibo(n). What would it mean to them when they have only just learned what numbers and arithmetic are?

More likely they start an exploration, an experiment, adding 0 and 1, 1 and 1, 1 and 2 etc. And see where it goes.

Which is likely how the sequence was first ever conceived by anybody.

Anyway, how is the million digit fibo in Rust going :slight_smile:

1 Like

Starting with the accumulator values swapped is not necessary, but it does have the nice property that you don't have to calculate Fn+1 to return Fn, so you avoid any possible overflow near T::MAX (doesn't make a difference for T = usize, but might for other widths / similar algorithms). The same technique would work for your iterative function too.

I do think that the linear recursive algorithm has one real advantage over the iterative one: it does not have any state, which makes it easy to reason about systematically because the "iterations" cannot influence each other and there are no intermediate invalid states. In the loop version, the invariant you're trying to preserve is that low is Fn and high is Fn+1, but there's a point in each iteration of the loop where this invariant fails, and restoring it is highly sensitive to the ordering of assignments and the end of the loop. In fibonacci this is trivial, but in more complicated code it's not uncommon to accidentally observe a broken invariant in code where you weren't expecting it.

That said, the loop-based version does have the advantage that it's guaranteed not to blow up the stack, and since Rust doesn't have TCE that's the version I would definitely use in most real-world situations.

That's interesting, I can't remember ever learning about the Fibonacci sequence in a context that wasn't in some way about recursion. I'm not even sure I know any real-world applications to justify it; to me it's only ever been "a mildly interesting sequence of numbers defined by a recurrence relation". Learning about the sequence and not the recurrence formula seems weird to me, like missing the point.

1 Like

I don't think there is any danger of blowing up the stack, unless it's really small. The maximum recursion depth is only of order log(n). So call depth of only 128 when using u128. The algorithm is so so slow it's impossible to get anywhere near that.

I got the same answer as your hex file with my implementation, however, my implementation is a bit slower (~10 minutes on AMD Ryzen 3700X, single thread). However, I implemented Add<&BigUInt> instead of AddAssign, so I make a new copy and drop one of the previous two numbers on each iteration. Now just for implementing code to convert it to decimal...

Coming back to the original question after all the sidetracks:

Seems like there are a few options:

  1. RefCell<T> or Lazy<Mutex<T>>
  2. Pass the cache as function parameter.

There were arguments for and against both approaches. However, I just realized something which makes me want to go with the first option. Memoization is an implementation detail. As such, it shouldn't change the function signature. Besides, I'd like to enforce that only 1 cache is built up in each run of a program. If I would have the user pass in a vector, users might pass in a different one each time, at least somewhat negating the benefit.

This isn't a library yet, but if it is a separate crate/mod or even a published library, I'd say number 2 is a no-go. Am I correct in thinking so? If not, why not?

And if the goal is to create a library with a clean public API with 1 function taking an integer n and returning fib(n), maybe doing memoization as optimization if, after measurement, it was found to be beneficial and worth the memory trade-off, but not making it part of the public API so that users can use the library without worrying about implementation details, where would be the best place to store the cache.

Lastly, maybe I'm overthinking it. Would you say that this is true: "the best solution for extensible, reusable library code isn't necessarily the best solution for a specific application". Maybe it is better to store the cache as a private static RefCell or Lazy in the module if it were a library, but for a program you write directly, the cache-as-an-argument approach (CAAA™) is the optimal solution.

What do you guys think?

I think this is true a lot of the time.

You should probably provide both versions, in accordance with what I have already mentioned in my previous posts:

  • The basis of the implementation should be the version which accepts an additional parameter for the cache. This should be provided so that whoever wants to use it with a non-global cache can do so, for whatever unforeseen reason they might need to do that. This function can have a longer name, because it's the one that users will call less frequently (e.g. fib_cached()).
  • You can then provide a variant without an explicit cache argument for convenience, which contains the hidden Lazy inside its body, but is otherwise implemented in terms of the more general version. You can give this function a shorter, nicer name, and document it as the primary entry point to your library, mentioning that the other, explicitly-cached one is available as well.

Please, do not force your users to rely on hidden global state unless it is absolutely required for reasons of correctness. It is really painful and disappointing to find a library that does just what one needs, only to find out that there's that one quirky aspect of it that ultimately makes it just not quite adequate for the use case.

3 Likes