Optimizing Fast Fibonacci Computation

No that's provably false. Take a look at the helper function (also mhw's post). Dijk's algorithm is essentially an optimized form of the matrix exponentiation method (which is also O(logN)). That's where the magic happens.

Oh I missed the part. Definitely interesting approach.

It seems the iterative approach still is possible by calculating only fib(2^n)s, skipping the caches entirely. Haven't tried myself yet but would be cool experiment.

That is provably false.

My C++ Fibonacci produces the million digits of fibo(4784969) in 0.39 seconds on one core of a Mac M1.

That is slow compared to those that know what they are doing.

That's nice. But my question is really "is there something mechanical I'm doing wrong here that's making this code slower than the Julia implementation?". If the suggestion is to just implement it using BINET formula or call GMP then no offense but that's kinda missing the point.

The stdlib's default hasher is secure but somewhat slow especially for the small input like usize. Try replace the hashmap with the fnv or fxhash crate.

Hmm this is interesting. I did notice that hash performance does differ quite a bit between some languages (Python is surprisingly fast there for example). I'll try it out.

Edit: neither seemed to have made a tangible difference in this case. :frowning:

No offence taken. No, that is not the suggestion.

The code I linked to does not use GMP or any such big number maths library. It's all just C++ code written by myself.

In outline:
Big integers are stored as vectors of 64 bit unsigned integers.
Each element of those vectors is a "digit" of the number in base 1000000000.
The Karasuba algorithm is used to perform fast multiplication of those numbers: Karatsuba algorithm - Wikipedia
There is a "small number optimisation" in that numbers that can be represented in 16 or so of such "digits" are kept in an array on the stack rather than in the vector on the heap.
The actual Fibonacci is calculated using the "Fast Doubling" algorithm as described here: Fast Fibonacci algorithms

This is probably not something you will want to waste time tackling. But my repo does need a fibo(4784969) solution in Rust. As you might have noticed there are solutions in many other languages there. Written by many people in response to a challenge to calculate the million digit fibo in ones favourite language, without using any libraries other than what came out of the box as standard with the language.

A kind of test of coding chops and language performance.

1 Like

My apologies. I thought you wanted me to call C++ from Rust to beat Julia's time haha.

Unfortunately this particular piece of code won't qualify since it makes use of external crates for BigInt. However if someone with a pure homebrew Rust implementation of BigInt wants to take this and marry them together for a submission please feel free to do so.

No worries.

Actually I owe you an apology. Now that I look at you code in the first post properly I see that you are using some version of the "Fast Doubling" algorithm.

This is interesting. The gold standard form for fast fibo is the Fibonacci function provided by GMP. It is (was) the fastest I have found so far. For fibo(4784969) in C on my MacBook Pro M1 I get:

$ time ./fibo
...
...9211500699706378405156269
./fibo  0.10s user 0.01s system 81% cpu 0.127 total

For a fast doubling algorithm in C, similar to yours but with no caching, I get:

$ time ./fibogmp
...9211500699706378405156269
./fibogmp  0.40s user 0.02s system 75% cpu 0.560 total

My C++ effort, no GMP but homebrew bigint maths gets:

$ time ./fibo_karatsuba
...9211500699706378405156269
./fibo_karatsuba  0.39s user 0.01s system 95% cpu 0.408 total

For your Rust code I get:

$ time ./target/release/fibo_fengkehh
9211500699706378405156269
./target/release/fibo_fengkehh  0.10s user 0.00s system 81% cpu 0.131 total

Which makes your Rust implementation as fast as the GMP gold standard fibo function.

Now what was it you wanted to optimise again?

:slight_smile:

Hmm this is interesting. I mean I knew it was decently fast but didn't think it would be this fast. However it should be taken with a giant grain of salt as it's using Integer from rug which I believe links to GMP, MPFR etc for its large number implementation. If BigInt operations are often the speed bottleneck this is certainly cheating.

If it's not too much trouble, can you bench the Julia code from mhw in the first post as well? My main gripe with my code is really that it's slower than Julia by 3 to 4 seconds when n = 10^9, with the latter being significantly easier to code to boot. Julia also uses GMP for its BigInt so at least I think comparison between the two is fair.

We have to be careful what we are timing here.

Unfortunately I'm using a MacBook Pro M1 and there is no Julia available for it's Arm processor yet. Unless I build it from source. So I have to run it under "Rossetta" which does a translation from x86 to Arm on the fly. This is likely to be slower than a true Arm build.

A ran that Julia code for fibo(4784969) with printing of the million digit result.
I got timings like this:

julia fibo.jl  0.46s user 1.17s system 370% cpu 0.440 total
./fibo_fengkehh  0.10s user 0.01s system 80% cpu 0.135 total

Which indicates your Rust is quicker. Modulo the rosetta translation issue.

Without printing the results I get timings like:

julia fibo.jl  0.25s user 0.53s system 262% cpu 0.299 total
./fibo_fengkehh  0.04s user 0.00s system 95% cpu 0.046 total

Which makes your Rust even faster !

Or you could just avoid hashing altogether and store the numbers in a Vec instead. I don't know whether this version actually has the same recurrence structure as the naïve definition, but my gut instinct is that a sporadic Vec::resize() and a comparison to a sentinel value, e.g. 0, would be way faster than any sort of hashing scheme.

2 Likes

If I'm not mistaken, the rug crate acts as a higher level frontend for GMP, so the fact that its performance is similar is not surprising.

I have actually sent you a pull request for a Rust version of fibo_4784969, heavily inspired by your C++ version. It can by now use a number of different backends (ibig, gmp, num_bigint). If I'm not mistaken, using Eric Olson's doubling formulae, you will not benefit from a cache because at every step in the recursion you only need to recurse with half the previous index.

My implementation can be found on GitHub. I have not yet added a rug implementation your algorithm, and still need to do a better analysis, but comparing your algorithm using the gmp crate with @fengkehh 's caching algorithm above shows that the total running time using your algorithm is slightly lower than using the above Dijkstra algorithm (0.14 vs 0.15 s). Given that I expect that by far most of the running time is in printing out the result, I think your algorithm is quite a bit faster in the computation. Though again, I still need to check this.

EDIT: ok, the difference is a bit smaller than I expected, but measurable:

~> ./target/release/fibo_4784969 -b gmp > fibo.out
computing F(4784969): 0.049s
printing F(4784969): 0.093s

~> ./target/release/fibo_4784969 -b rug2 > fibo.out
computing F(4784969): 0.064s
printing F(4784969): 0.094s

@H2CO3 I suppose it's possible to define our own "hashing" from n to an index on a Vector to avoid having to use HashMap. I suspect if you hash straight from n to index you'll end up with a kind of sparse Vector that's half missing at all times though. Anyway this is getting a bit side tracked since I was really just interested in figuring out why it's slower than Julia despite being mechanically almost identical (both using HashMap etc).

@gvissers I think you are doing Dijk's algorithm without memoization/caching. That's going to be an avoidable source of slow down since the recursion can actually hit the cache quite often (if you do cache). Just doing n = 100 would hit the cache 16 times, for example. I suspect the difference is on the order of O(logN^2) but that's just a hunch. It's true that at a smaller index (ie: n = 4784969) the caching might end up generating so much overhead it makes the code actually slower, however at higher n (ie: a billion) it's going to make a very big difference.

I;m afraid we are drifting further away a bit from your original question than intended, apologies for that. But:

Actually, not completely. It is possible to write recurrence relations for the pairs of consecutive Fibonacci numbers (Fk, Fk+1) that only use the values in the pair for k/2:

  • (F2k, F2k+1) = (Fk[2Fk+1-Fk], Fk+1(2Fk+1-Fk] + (-1)k+1), and
  • (F2k+1, F2k+2) = (Fk[2Fk+Fk+1] + (-1)k, Fk+1[2Fk+Fk+1])

So, to compute a (even k, odd k) pair use the first relation, for an (odd k, even k) pair use the second. To compute a single Fibonacci number Fn, calculate the pair (Fn/2, Fn/2+1) using the above relations, and combine them.

Note that at every step in the recursion, you only need a single recursive call to compute the pair for k/2. So every call to the function computing such a pair has a distinct (and strictly descending) value for k, and as such caching is not helpful.

I did not come up with these relations, I simply stole them from @ZiCog 's repository. I believe that it was Eric Olson who first introduced them in a FreeBASIC program to compute Fibonacci numbers, I might be mistaken though.

I did a small bit of benchmarking. Your program as written above, using rug took approximately 12s to compute F(109). Using the above relations, it took ~9s. I will admit though that I haven't looked if your program could be improved further.

To at least pretend to get back on track: it might be worth checking @H2CO3 's suggestion to use a Vec instead of a HashMap for caching. If that makes a significant difference, it is possible that the HashMap implementation in Rust is slower than in Julia (at least for u32 keys), and that the difference can be explained by that (though perhaps, you should then also do the same in Julia for a fair comparison).

1 Like

Ooh this is very interesting (and should be even faster than my implementation). Thanks!

Also, if the Julia version with a plain dynamic array is still faster than the Rust version that uses Vec, then the difference might be due to JIT optimization. I have once written a microbenchmark in which prime counting in JavaScript won over the same algorithm implemented in C because of agressive JITing.

1 Like

I have added an implementation of Fibonacci numbers calculation to my bigint benchmarks. In two versions: decimal and hexadecimal.

The hexadecimal one is particularly simple because it only really involves multiplications, assuming the number representation is already in binary.

I still like my "digits of e" benchmark a bit more given that it has more of a variety of operations: there are some small numbers involved, a lot of medium-sized ones, some big ones, multiplication and division and exponentiation.

I'm open to other benchmarking ideas.

A huge thank you for that. It is merged now.

I just posted the results I to a similar fibonacci thread here: Code Review + Why is this Fibonacci code so slow? - #10 by ZiCog

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.