It's clever that you can adapt the hashing algorithm according to domain-specific characteristics to obtain a performance benefit — in this case, it's OK to use a deeply flawed but fast hashing algorithm in the compiler, because you don't really have to worry about malicious algorithmic complexity attacks and it works great for hashing data structure definitions.
The approach reminds me of how compression algorithms are tailored to be suitable for various kinds of input: run-length encoding for inputs with lots of repetition (e.g. many logos), delta encoding for slowly moving signals (e.g. audio), and so on.
I think if you are using a hash dictionary for HTML headers, or similar, there is a concern. I am using BtreeMap for that, but I do use HashMap internally for looking up things like function names, table names, and also for page cache lookup ( where the key is just a number ). That said, I think the way to respond to denial of service attacks is to block the traffic, although maybe automating that isn't always simple.
Yes, but the point is that in this case, the hashing algorithm won't be used for HTML headers — it's only being used in the compiler, where that concern doesn't apply.
From the article:
(Fortunately, the compiler is not an interesting target for HashDoS attacks.)
FxHasher is not a suitable algorithm for general use in hash dictionaries, but it allows for optimization in a specific domain.
Ah, yes, true, but we are talking slightly cross-purpose here. I was referring to my adoption of FxHash I decided upon earlier, per my original post here.
It's only not suitable in situations where an attacker can mount a denial of service attack by choosing keys that cause collisions. As above, I think that would be typically HTML header dictionaries, or similar situations.
I think so, or at minimum link to the full code. Is it incremental-related? I.e. will only happen with a specific sequence of build, edit, build? That's a bit what it sounds like.
I think so, it seemed to occur when I stopped using FxHash. I haven't had luck making a minimal example yet, but I can probably publish the crate to at least preserve the example I have.
Edit: well now I cannot reproduce it at all. I did originally reproduce it at least once.
Makes somewhat sense. std::collection::HashMap seeds the hasher with a random number, so iteration order will differ between runs. If the problem is iteration order related, only part of the orders will give the error and thus only part of the runs will give the error.
Well, this is not a run-time error, it's the compiler panicking, I never get to run the program ( I was actually just calling cargo check ). It's something to do with what has been compiled before I think, some kind of error in the incremental compilation. But it hasn't been easy to reproduce, it comes and goes.
(2) I change it back again. I run cargo check.
(3) I run an example in my lib directory.
(4) I make the change to the lib.rs
(5) I run cargo check -> produces the error.
So it's something to do with changing the toml file to use a different lib source, in conjunction with the change to lib.rs. Maybe it is somehow trusting the version number, when the local source has changed, or something? Maybe if I change the version in my local toml file it won't happen.
Well when it was failing, I changed the version number in my lib toml file:
name = "rustdb"
version = "0.4.1"
It then compiles and runs ok. If I change it back to 0.4.0 the error comes back. So I think having two different versions of the crate from different sources ( but having the same version number ) can end up confusing cargo and produce puzzling errors?