AHash failing silently

I'm writing an RSS consumer that needs to hash article data for performance in ValKey.

I am processing these RSS items in a loop while I'm prototyping and I ran into a very odd bug. Some of the values fail to hash when I don't print anything else in that loop iteration. Not only that, but the entire println! macro is skipped.

Here is the code that reproduces my issue:

    for item in chn.items.into_iter() {
        //println!("{}", item.guid().unwrap().value());
        println!("{}", item.title().unwrap());
        let mut hasher = AHasher::default();
        let b = item.guid().unwrap().value().as_bytes();
        assert!(!b.is_empty());
        hasher.write(b);
        println!("Hash: {:?} \n", hasher.finish());
        //println!("end of item");
    }

And here is the output:

Trump Has Said ‘No Exceptions’ to His Tariffs. Will That Last?
Hash: 1057710333405281575 

Elon Musk and Marco Rubio Share Awkward Social Media Embrace After White House Confrontation
Hash: 11924599548058660221 

Trump, With More Honey Than Vinegar, Cements an Iron Grip on Republicans
Trump Seeks to Expel a Green Card Holder, Mahmoud Khalil, Over Student Protests
Ukraine Must Cede Territory in Any Peace Deal, Rubio Says
Hash: 1523441670221523159

Russian Forces Depleted and Stalling on Eastern Front, Ukraine Says
Hash: 18412823308182799818

Here is the code that works properly:

for item in chn.items.into_iter() {
        //println!("{}", item.guid().unwrap().value());
        println!("{}", item.title().unwrap());
        let mut hasher = AHasher::default();
        let b = item.guid().unwrap().value().as_bytes();
        assert!(!b.is_empty());
        hasher.write(b);
        println!("Hash: {:?}", hasher.finish());
        println!("end of item");
    }

And here is the output:

Trump Has Said ‘No Exceptions’ to His Tariffs. Will That Last?
Hash: 12285485996232063774
end of item
Elon Musk and Marco Rubio Share Awkward Social Media Embrace After White House Confrontation
Hash: 7187227338832286181
end of item
Trump, With More Honey Than Vinegar, Cements an Iron Grip on Republicans
Hash: 9681329398627943619
end of item
Ukraine Must Cede Territory in Any Peace Deal, Rubio Says
Hash: 7169634133579351125
end of item
Russian Forces Depleted and Stalling on Eastern Front, Ukraine Says
Hash: 14521009450892122283
end of item

I welcome any ideas, because frankly this is baffling.

This comment suggests your loop has an early exit like if cond { continue; }, but the reproducer you shared doesn't have that. Is this the full code? More context would be appreciated.

Did you mean to create a new hasher with each loop iteration? If “yes” then why not just use the UUID?

I think you mean the GUID. GUIDs aren't consistent; some of the ones I'm receiving are very long links. I'm getting consistent hashes (I've since switched to md-5, which has the same issue as described above), as in the finished project I'll be hashing again to check for collisions using ValKey.

This is the full code, at least the parts that are important for my issue. It is the entirety of the loop; there are no flow statements like continue. println!("Hash: {:?} \n", hasher.finish()); simply fails to execute when it's the last statement in the loop.

As I said, this is baffling, and I've eliminated it being an AHash issue - I'm getting the same out of md-5. I'll likely need help from someone who can read bytecode, because this is either a compiler bug or my CPU has a flaw in it.

This doesn't make sense. I suspect your issue is one of the following:

  1. You have undefined behavior somewhere, perhaps in a dependency or an unsafe block
  2. You are running into a miscompilation
  3. The code you have shared is not the real culprit

To rule out #1, compare debug and release, and try running your project under miri. If it can't run under miri, minimize the reproducer until it can.

To rule out #2, try running it with a few different rustc versions, perhaps compare stable to nightly, and again you could compare debug to release.

To rule out #3, try inserting debug or logging statements at various locations.

7 Likes

I agree that this is Undefined Behavior somewhere. Run with Miri (if you have no C dependencies) or Address Sanitizer (if you do have C dependencies -- you need to compile a fresh ASan version of them!).

Thanks for the guidance, I've never had to dive this deep into Rust's innards before.

So, I have only been able to reproduce the bug once today, and that was without Miri. I modified my code so that Miri would stop complaining about unsupported FFIs, and it did, but I haven't yet been able to reproduce the bug on that branch.

For posterity, I pushed the branch I preserved with my possibly broken code to a public GitHub repo and created the following branches:

I'll keep trying to reproduce the bug from the maybe-unstable repo. Maybe it's something in the input data; I'll store the RSS feed content in case the bug pops up again.

I suggest you divide the code into three parts:

  1. downloading the xml
  2. parsing the xml and retrieving the (title, url) pairs. Store them in a file.
  3. hashing and printing

That way, you can identify where the issue is.