FxHash blog by Nicholas Nethercote "A brutally effective hash function in Rust"

A nice article on FxHasher here:

It inspired me to change to FxHasher. I basically only had to change one line:

// use std::collections::{HashMap,HashSet};
use rustc_hash::{FxHashMap as HashMap, FxHashSet as HashSet };

Although I also had to change some new() calls to default() [ which works with either ].

1 Like

It's clever that you can adapt the hashing algorithm according to domain-specific characteristics to obtain a performance benefit — in this case, it's OK to use a deeply flawed but fast hashing algorithm in the compiler, because you don't really have to worry about malicious algorithmic complexity attacks and it works great for hashing data structure definitions.

The approach reminds me of how compression algorithms are tailored to be suitable for various kinds of input: run-length encoding for inputs with lots of repetition (e.g. many logos), delta encoding for slowly moving signals (e.g. audio), and so on.

1 Like

I think if you are using a hash dictionary for HTML headers, or similar, there is a concern. I am using BtreeMap for that, but I do use HashMap internally for looking up things like function names, table names, and also for page cache lookup ( where the key is just a number ). That said, I think the way to respond to denial of service attacks is to block the traffic, although maybe automating that isn't always simple.

cc @nnethercote

Yes, but the point is that in this case, the hashing algorithm won't be used for HTML headers — it's only being used in the compiler, where that concern doesn't apply.

From the article:

(Fortunately, the compiler is not an interesting target for HashDoS attacks.)

FxHasher is not a suitable algorithm for general use in hash dictionaries, but it allows for optimization in a specific domain.

2 Likes

Ah, yes, true, but we are talking slightly cross-purpose here. I was referring to my adoption of FxHash I decided upon earlier, per my original post here.

It's only not suitable in situations where an attacker can mount a denial of service attack by choosing keys that cause collisions. As above, I think that would be typically HTML header dictionaries, or similar situations.

I like the optimism of that #[inline] attribute!

Haha

1 Like

Just in case anyone else gets the same thing, when I reverted back to std::collections::HashMap, I got a compiler error, which I just reported here.

As a tip, use backtick fences for rust and text snippets in the github issues too.

For Rust:

```rust
```

For text:

```
```
1 Like

Would it be worth trying to make a minimal example that causes the compiler panic?

I think so, or at minimum link to the full code. Is it incremental-related? I.e. will only happen with a specific sequence of build, edit, build? That's a bit what it sounds like.

There's a recent issue with the same panic message. ICE while bootstrapping stage0 rustc with incremental · Issue #91401 · rust-lang/rust · GitHub

I think so, it seemed to occur when I stopped using FxHash. I haven't had luck making a minimal example yet, but I can probably publish the crate to at least preserve the example I have.

Edit: well now I cannot reproduce it at all. I did originally reproduce it at least once.

Same line in compiler source code, so I guess it's likely the same or similar bug.

Makes somewhat sense. std::collection::HashMap seeds the hasher with a random number, so iteration order will differ between runs. If the problem is iteration order related, only part of the orders will give the error and thus only part of the runs will give the error.

Well, this is not a run-time error, it's the compiler panicking, I never get to run the program ( I was actually just calling cargo check ). It's something to do with what has been compiled before I think, some kind of error in the incremental compilation. But it hasn't been easy to reproduce, it comes and goes.

Oh, I though this was changing between the two in the compiler source.

Well I have managed to reproduce it again. It's quite a long series of steps, hard to even remember, something like this:

(1) I change a toml file to use rustdb from crates.io. This is a test program in a separate directory. I run cargo check.

#rustdb = { path = "../rustdb" }
rustdb = { version = "0.4.0" }

(2) I change it back again. I run cargo check.
(3) I run an example in my lib directory.
(4) I make the change to the lib.rs
(5) I run cargo check -> produces the error.

So it's something to do with changing the toml file to use a different lib source, in conjunction with the change to lib.rs. Maybe it is somehow trusting the version number, when the local source has changed, or something? Maybe if I change the version in my local toml file it won't happen.

With full backtrace:


C:\Users\pc\rust\rustdb>set RUST_BACKTRACE=full

C:\Users\pc\rust\rustdb>cargo run --example axumtest
   Compiling rustdb v0.4.0 (C:\Users\pc\rust\rustdb)
thread 'rustc' panicked at 'no entry found for key', compiler\rustc_metadata\src\rmeta\decoder\cstore_impl.rs:492:9
stack backtrace:
   0:     0x7ff96ef892cf - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h53606b08c01ef688
   1:     0x7ff96efb306a - core::fmt::write::he97f21b785b90134
   2:     0x7ff96ef7b448 - <std::io::IoSlice as core::fmt::Debug>::fmt::h967100caef347f28
   3:     0x7ff96ef8cd96 - std::panicking::take_hook::hb46205f0055becfd
   4:     0x7ff96ef8c797 - std::panicking::take_hook::hb46205f0055becfd
   5:     0x7ff963eaa295 - <tracing_subscriber::fmt::writer::TestWriter as std::io::Write>::flush::hff7a705da113f4ca
   6:     0x7ff96ef8d6a9 - std::panicking::rust_panic_with_hook::hce0846b912adb5c5
   7:     0x7ff96ef8d14b - rust_begin_unwind
   8:     0x7ff96ef89bf7 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h53606b08c01ef688
   9:     0x7ff96ef8d0a9 - rust_begin_unwind
  10:     0x7ff96efe9510 - core::panicking::panic_fmt::h58715deff5969a45
  11:     0x7ff96efafc80 - <core::panic::panic_info::PanicInfo as core::fmt::Display>::fmt::h4d508ea87f28e23c
  12:     0x7ff96efe940b - core::option::expect_failed::h5c52589261ce74a3
  13:     0x7ff967a9f9ea - rustc_metadata::rmeta::decoder::cstore_impl::<impl rustc_session::cstore::CrateStore for rustc_metadata::creader::CStore>::stable_crate_id_to_crate_num::hb58d3341ff03f448
  14:     0x7ff96772b93f - <rustc_query_impl::on_disk_cache::OnDiskCache as rustc_middle::ty::context::OnDiskCache>::def_path_hash_to_def_id::h9f7b03ae96e81f81
  15:     0x7ff96843f3da - rustc_middle::dep_graph::dep_node::<impl rustc_query_system::dep_graph::dep_node::DepNodeParams<rustc_middle::ty::context::TyCtxt> for rustc_span::def_id::CrateNum>::recover::h99f38298d5d2468e
  16:     0x7ff96765ca8d - rustc_query_impl::query_callbacks::crate_hash::force_from_dep_node::h1a30158b4794c3cc
  17:     0x7ff9676acb34 - <rustc_query_impl::Queries as rustc_middle::ty::query::QueryEngine>::try_mark_green::h79c2867594abd9a1
  18:     0x7ff9676acb0b - <rustc_query_impl::Queries as rustc_middle::ty::query::QueryEngine>::try_mark_green::h79c2867594abd9a1
  19:     0x7ff96768497d - <rustc_query_impl::Queries as rustc_middle::ty::query::QueryEngine>::try_mark_green::h79c2867594abd9a1
  20:     0x7ff96736b81e - <rustc_mir_dataflow::framework::cursor::CursorPosition as core::fmt::Debug>::fmt::hdad43e681eb5eae7
  21:     0x7ff9676726d0 - <rustc_query_impl::Queries as rustc_middle::ty::query::QueryEngine>::try_mark_green::h79c2867594abd9a1
  22:     0x7ff963fdf24e - <rls_span::compiler::DiagnosticSpanMacroExpansion as core::fmt::Debug>::fmt::hd8fad7e8b9a18847
  23:     0x7ff963ff8f6c - rustc_interface::queries::Linker::link::h24820195fd91a4b8
  24:     0x7ff96409b315 - rustc_interface::passes::boxed_resolver::BoxedResolver::to_resolver_outputs::h32390bfacafe234b
  25:     0x7ff964013eb2 - rustc_interface::passes::analysis::hc0b219c9fceabc88
  26:     0x7ff96762b8eb - ZN16rustc_query_impl13on_disk_cache260_$LT$impl$u20$rustc_serialize..serialize..Decodable$LT$rustc_query_impl..on_disk_cache..CacheDecoder$GT$$u20$for$u20$$RF$std..collections..hash..set..HashSet$LT$rustc_span..def_id..LocalDefId$C$core..hash..BuildHasher
  27:     0x7ff9676bc61d - <rustc_query_impl::Queries as rustc_middle::ty::query::QueryEngine>::try_mark_green::h79c2867594abd9a1
  28:     0x7ff967602972 - <rustc_span::def_id::LocalDefId as rustc_query_impl::profiling_support::SpecIntoSelfProfilingString>::spec_to_self_profile_string::h641e5a6d4a18de66
  29:     0x7ff9673a4849 - <rustc_mir_dataflow::framework::cursor::CursorPosition as core::fmt::Debug>::fmt::hdad43e681eb5eae7
  30:     0x7ff96766e095 - <rustc_query_impl::Queries as rustc_middle::ty::query::QueryEngine>::try_mark_green::h79c2867594abd9a1
  31:     0x7ff963efe59a - <tracing_subscriber::util::TryInitError as core::fmt::Display>::fmt::h8e44660347c50da5
  32:     0x7ff963ec8944 - rustc_driver::pretty::print_after_hir_lowering::h16ae7f3f7d316762
  33:     0x7ff963f007e6 - <tracing_subscriber::util::TryInitError as core::fmt::Display>::fmt::h8e44660347c50da5
  34:     0x7ff963eced63 - rustc_driver::pretty::print_after_hir_lowering::h16ae7f3f7d316762
  35:     0x7ff963f29a58 - <rustc_driver::args::Error as core::fmt::Debug>::fmt::h11014d06e9af86ea
  36:     0x7ff96ef99d8c - std::sys::windows::thread::Thread::new::h4046847bccf80b57
  37:     0x7ff9e4a77034 - BaseThreadInitThunk
  38:     0x7ff9e53a2651 - RtlUserThreadStart

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md

note: rustc 1.57.0 (f1edd0429 2021-11-29) running on x86_64-pc-windows-msvc

note: compiler flags: -C embed-bitcode=no -C debuginfo=2 -C incremental --crate-type bin

note: some of the compiler flags provided by cargo are hidden

query stack during panic:
#0 [analysis] running analysis passes on this crate
end of query stack
error: could not compile `rustdb`

Well when it was failing, I changed the version number in my lib toml file:

name = "rustdb"
version = "0.4.1"

It then compiles and runs ok. If I change it back to 0.4.0 the error comes back. So I think having two different versions of the crate from different sources ( but having the same version number ) can end up confusing cargo and produce puzzling errors?