Sometimes using Rust is really annoying

So in a recent conversation here I was reminded of an old Rust project in my GitHub and I thought I would check it out again just for fun. fibo_4784969 in Rust

So I cloned it to my new Mac M4 and tried to build it. Sadly out of the box it fails with errors unfathomable to me.

   Compiling num-bigint v0.4.0
error[E0308]: mismatched types
    --> /Users/Michaael/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/num-bigint-0.4.0/src/biguint/convert.rs:70:19
     |
  70 |         .div_ceil(&big_digit::BITS.into())
     |          -------- ^^^^^^^^^^^^^^^^^^^^^^^ expected `u64`, found `&_`
     |          |
     |          arguments to this method are incorrect
     |
     = note:   expected type `u64`
             found reference `&_`
note: method defined here
    --> /Users/Michaael/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/num/mod.rs:1151:5
     |
1151 | /     uint_impl! {
1152 | |         Self = u64,
1153 | |         ActualT = u64,
1154 | |         SignedT = i64,
...    |
1168 | |         bound_condition = "",
1169 | |     }
     | |_____^
     = note: this error originates in the macro `uint_impl` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider removing the borrow
     |
  70 -         .div_ceil(&big_digit::BITS.into())
  70 +         .div_ceil(big_digit::BITS.into())
     |

error[E0308]: mismatched types
    --> /Users/Michaael/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/num-bigint-0.4.0/src/biguint/convert.rs:585:19
     |
 585 |         .div_ceil(&u64::from(bits))
     |          -------- ^^^^^^^^^^^^^^^^ expected `u64`, found `&u64`
     |          |
     |          arguments to this method are incorrect
     |
note: method defined here
    --> /Users/Michaael/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/num/mod.rs:1151:5
     |
1151 | /     uint_impl! {
1152 | |         Self = u64,
1153 | |         ActualT = u64,
1154 | |         SignedT = i64,
...    |
1168 | |         bound_condition = "",
1169 | |     }
     | |_____^
     = note: this error originates in the macro `uint_impl` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider removing the borrow
     |
 585 -         .div_ceil(&u64::from(bits))
 585 +         .div_ceil(u64::from(bits))
     |

error[E0308]: mismatched types
    --> /Users/Michaael/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/num-bigint-0.4.0/src/biguint/convert.rs:613:19
     |
 613 |         .div_ceil(&u64::from(bits))
     |          -------- ^^^^^^^^^^^^^^^^ expected `u64`, found `&u64`
     |          |
     |          arguments to this method are incorrect
     |
note: method defined here
    --> /Users/Michaael/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/num/mod.rs:1151:5
     |
1151 | /     uint_impl! {
1152 | |         Self = u64,
1153 | |         ActualT = u64,
1154 | |         SignedT = i64,
...    |
1168 | |         bound_condition = "",
1169 | |     }
     | |_____^
     = note: this error originates in the macro `uint_impl` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider removing the borrow
     |
 613 -         .div_ceil(&u64::from(bits))
 613 +         .div_ceil(u64::from(bits))
     |

error[E0308]: mismatched types
    --> /Users/Michaael/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/num-bigint-0.4.0/src/biguint.rs:398:54
     |
 398 |                 let root_scale = extra_bits.div_ceil(&n64);
     |                                             -------- ^^^^ expected `u64`, found `&u64`
     |                                             |
     |                                             arguments to this method are incorrect
     |
note: method defined here
    --> /Users/Michaael/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/num/mod.rs:1151:5
     |
1151 | /     uint_impl! {
1152 | |         Self = u64,
1153 | |         ActualT = u64,
1154 | |         SignedT = i64,
...    |
1168 | |         bound_condition = "",
1169 | |     }
     | |_____^
     = note: this error originates in the macro `uint_impl` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider removing the borrow
     |
 398 -                 let root_scale = extra_bits.div_ceil(&n64);
 398 +                 let root_scale = extra_bits.div_ceil(n64);
     |

For more information about this error, try `rustc --explain E0308`.
error: could not compile `num-bigint` (lib) due to 4 previous errors
warning: build failed, waiting for other jobs to finish...
^

Trying to load the thing into my Zed editor results in rust-analyser hanging up forever with message "Building compile-time-deps"

What to do?

Upgrade your num-bigint dependency to 0.4.6.

I was just starting on that road of updating dependencies.

Updating only num-bigint to 0.4.6 gets rid of the Rust errors I showed earlier.

But it now fails with:

% cargo build --release
**Updating** crates.io index
**Locking** 3 packages to latest compatible versions
**Updating** num-bigint v0.4.0 -> v0.4.6
**Updating** num-integer v0.1.44 -> v0.1.46
**Updating** num-traits v0.2.14 -> v0.2.19
**Compiling** gmp-mpfr-sys v1.4.4
**Compiling** ibig v0.3.1
**error****:** failed to run custom build command for `gmp-mpfr-sys v1.4.4`
Caused by:
process didn't exit successfully: `/Users/Michaael/fibo_4784969/rust/target/release/build/gmp-mpfr-sys-e2c4e0ded6565aa8/build-script-build` (exit status: 101)
--- stdout
cargo:rerun-if-env-changed=GMP_MPFR_SYS_CACHE
$ mkdir -p "/Users/Michaael/fibo_4784969/rust/target/release/build/gmp-mpfr-sys-dd4f5ef878d46ec1/out/try_extended_key_value_attributes"

An a thousand or so more lines of gibberish after that. Containing some scary messages about segfaults toward the end:

libtool: warning: assuming '-no-fast-install' instead
  ../../../gmp-src/test-driver: line 107: 15876 Segmentation fault: 11  "$@" > $log_file 2>&1
  ../../../gmp-src/test-driver: line 107: 15884 Abort trap: 6           "$@" > $log_file 2>&1
  ../../../gmp-src/test-driver: line 107: 15892 Abort trap: 6           "$@" > $log_file 2>&1
  ../../../gmp-src/test-driver: line 107: 15900 Segmentation fault: 11  "$@" > $log_file 2>&1
  ../../../gmp-src/test-driver: line 107: 15908 Segmentation fault: 11  "$@" > $log_file 2>&1
  ../../../gmp-src/test-driver: line 107: 15916 Segmentation fault: 11  "$@" > $log_file 2>&1
...
...

Grrr....

These are coming from gmp-mpfr-sys which compiles gmp and mpfr. I'd just run cargo update and cargo clean.

2 Likes

As if by magic cargo update and cargo clean fixed it. Thanks!

Now the question is: Why is the Rust version 25% slower than my C++ solution?

 % time ./c++/fibo_karatsuba > /dev/null              
./c++/fibo_karatsuba > /dev/null  0.40s user 0.00s system 99% cpu 0.401 total
% time ./rust/target/release/fibo_4784969 > /dev/null
computing F(4784969): 0.523s
printing F(4784969): 0.002s
./rust/target/release/fibo_4784969 > /dev/null  0.53s user 0.00s system 99% cpu 0.530 total

Run that like they are both using handcrafted big integer maths, as was the spirit of the original challenge.

On my Jetson AGX Orin, where I can use OMP for parallel processing it even worse:

$ time ./c++/fibo_karatsuba_omp > /dev/null
real 0m0.097s
user 0m0.638s
sys 0m0.009s
$ time ./rust/target/release/fibo_4784969 > /dev/null
computing F(4784969): 1.005s
printing F(4784969): 0.004s
real 0m1.015s
user 0m0.999s
sys 0m0.013s

The Million Digit Fibonaci Challenge is still open for anyone bored enough to tackle it :slight_smile:

Now that's C deps failing you.

Try RUSTFLAGS="-C target-cpu=native" (or some specific CPU preset), otherwise your executable is compiled for 20-year-old CPUs.

Beyond that, you'd need to profile it and/or inspect compiled code.

5 Likes

I have a question about native: I read somewhere that it can add multiple different implementations of code blocks in the binary and uses a runtime detection of CPU features to select which of the implementations to use (allowing a more portable binary, at the cost of some extra runtime checks and a slightly larger binary). Is this true?

Back in the day, IIRC, the C compiler I used had a similar feature, but I believe the compiler identified the CPU type and optimized accordingly (meaning a much less portable binary).

1 Like

target-cpu=native is the option that means the compiler uses the CPU type of the host. It would not make sense for it to add run-time detection, because by definition it means to be as non-portable as provides any benefit.

3 Likes

Yeah, that's why it seemed odd to me. Not to mention how complicated it would be to splice in all the alternative implementations.

My understanding is that if you want to do multiple variants then there isn't a complete story there, but it’s going to look something like building on top of #[target_feature] — explicitly asking for variants of a function to be compiled, and then explicitly dispatching to them after feature detection. Implicit dispatching would run the risk of putting the branching in the wrong place (inside rather than outside a hot loop).

1 Like

There's no automation for this at the compiler level in general (not in C either). You can get some library routines that select appropriate optimized implementations either via runtime detection or linker magic (e.g. memcpy on macOS uses this), but that's implemented manually by the libraries.

Automatically you may get multiple versions of a loop, with or without SIMD, depending on iteration count or alignment, but a CPU instruction set must already be known. Runtime CPU detection is too tricky and potentially slow to just sprinkle it all over the place.

While I prefer target-cpu=znver5 for online compilers (for any language) because it's short, I believe the correct default target should be x86-64-v3. It covers Intel processors since Haswell from 2013, and virtually every AMD processors people care. (is there any Bulldozer lover?) Ideally it should be x86-64-v4 but Intel being troll with its little cores makes this target unusable for most of their customer lineups.

1 Like

RUSTFLAGS="-C target-cpu=native" makes no difference.

I have no Intel processors here. I have Mac M4 and Nvidea Jetson AGX Orin. arm64.

The Rust version needs some parallelisation with Rayon or whatever to catch up with C++ and OMP.

Use codegen-units=1 and enable LTO. Try with both thin and fat LTO.

So I added

[profile.release]
codgen-units = 1
lto = "thin"

To my Cargo.toml. No observable change in performance:

% time ./target/release/fibo_4784969 > /dev/null
./target/release/fibo_4784969 > /dev/null 0.43s user 0.00s system 99% cpu 0.439 total
% time ../c++/fibo_karatsuba > /dev/null        
../c++/fibo_karatsuba > /dev/null  0.32s user 0.00s system 99% cpu 0.327 total 

"fat" or "thin" makes no difference.

1 Like

Sorry, I have not understood your actual performance issue. You had C++ code which used Open-MP, and compared that to plain Rust code? And is your C++ code and Rust code both using LLVM backend, or are you compiling your C++ code with a different compiler?

It's unrelated to what we are discussing here, but just to clarify what probably happens in the broken-phone story: Intel compuler always had this “split functions with auto-selection” trick and it was half-implemented by both gcc and llvm, but, importantly, not by clang (it's called multiversioning and requires manual annotations in gcc and clang). Intel, novadays, uses llvm as backend for its own compiler and it can do these “autosplit’ tricks, but AFAIK no one added them to rustc and clang.

That's probably where the confustion comes from.

3 Likes

Have you shared your C++ code somewhere in this thread and I've missed it?

How does either implementation compare to just calling the gmp function that computes Fibonacci numbers?

Sorry, I have not been very clear. This all started back in 2020 with the "Million Digit Fibonacci Challenge" on the Raspberry Pi user forums. Which was just a bunch of us challenging each other to show solutions in our favourite languages that ran on a Raspberry Pi 4 computer. A rule of the challenge was that no non-standard libraries could be used.

This resulted in a lot of solutions in all kind of languages from ALGOL 60 to Haskell. You can find them all here: GitHub - ZiCog/fibo_4784969: The million digit Fibonacci number challenge. Calculate fibo(4784969) in various languages.
My solution was in C++ and still stands as the highest performance of them all. Even in single threaded mode. I added use of Open MP to parallelize it later just as a learning exercise for myself.

Later a Rust solution was offered by Pi forum user gvissers.

Now days I have no Raspberry Pi around but Mac M4 laptops and Nvidia Jetsons.

On the Macs compiling C++ with g++ is actually using Clang and LLVM (Apple clang version 17.0.0). On Jetsons g++ will be using GCC.

So, the performance issue is that the Rust solution is 25% slower than the C++. That may be down to things like the fact that I implemented my own arrays/vectors for storing the big integers that uses the stack rather than allocating for numbers below a certain size. I don't know.

1 Like

See Github link in my last post here.

There is no competition with GMP, that is massively faster. As it happens the Rust solution has a command line parameter to select from various big number libs including GMP. Using GMP gets us down to a run time of 74ms from the hand crafted big integer of 438ms !

But using GMP is against the rules of the challenge.