Rayon to solve Advent Of Code Day 4

A bit late to the party I've started to solve Advent Of Code 2015 puzzles in Rust (with some hints from F# code). This tries to use Rayon to solve parts A and B of: Day 4 - Advent of Code 2015

#![feature(step_by)]

extern crate rayon;
extern crate crypto;

type T = u32;
type Tout = [u8; 16];

fn solve<F: Sync + Fn(&Tout) -> bool>(inf: T, sup: T, test: F) -> Option<T> {
    use crypto::md5::Md5;
    use crypto::digest::Digest;
    use rayon::par_iter::*;

    // KEY from:
    // https://github.com/theburningmonk/AdventOfCodeFs/blob/master/src/AdventOfCode2015/AdventOfCode2015/Day04.fsx
    const KEY: &'static [u8] = b"bgvyzdsv";

    let m = (inf .. sup)
            .into_par_iter()
            .filter(|i| {
                let mut hasher = Md5::new();
                hasher.input(KEY);
                hasher.input(i.to_string().as_bytes());
                let mut output = Tout::default();
                hasher.result(&mut output);
                test(&output)
            })
            .min(); // Doesn't return an Option<> ?

    if m == T::max_value() { None } else { Some(m) }
}

fn main() {
    let step: T = 40_000; // Found by trial and error on my CPU.

    for i in (0 .. T::max_value()).step_by(step) {
        if let Some(sol) = solve(i, i + step, |o| (o[0] | o[1] | (o[2] >> 4)) == 0) {
            println!("{}", sol);
            break;
        }
    }

    for i in (0 .. T::max_value()).step_by(step) {
        if let Some(sol) = solve(i, i + step, |o| (o[0] | o[1] | o[2]) == 0) {
            println!("{}", sol);
            break;
        }
    }
}

Compiled with:
cargo rustc --release --bin md5par -- -C target-cpu=native

[dependencies]
rust-crypto = "0.2.34"
rayon = { git = "https://github.com/nikomatsakis/rayon" }

[profile.release]
opt-level = 3
debug = false
lto = true

The program just computes many MD5 on computed strings looking for the first that starts with (problem A) five or (problem B) six zeros in the hex digest.

This program runs in 0.09 seconds on my PC (4 cores + hyperthreading). It's the first time I use Rayon. Is this code using Rayon in a reasonable way? Can I reduce its run-time or improve the code?

Can I perform "hasher.input(i.to_string().as_bytes());" without heap allocations in Rust (like a sprintf on a pre-allocated buffer)? (I guess this saving is not much important for this program, but it's handy to know in general).

into_par_iter() is a nice and quick way to paralellize things and it may be the best way in most cases.

But you can also try to split the range inf .. sup and use Rayon join() on these sub-ranges until it reaches a limit: sup - inf < limit, that limit could be 100 or 1000 it depends on your machine. Once that limit is reached process the range sequentially.

It's basically like the quicksort example on Rayons webpage. Are you using Rust nightly ?

(You probalby could also try to use into_par_iter() in the outer loop in main)

You can do the to string thing without heap allocation at all, by using a small u8 array on the stack. Getting the details right is a bit tricky, but here it is. ArrayVec or ArrayString may actually make this a bit easier..

3 Likes

So the Write impl for &mut [u8] acts like a cursor? That's neat!

I have not yiet tried @willi_kappler suggestions, but I have updated the code with stack-allocation of the number, and trying it on a slower CPU I can see a performance difference (27% faster! Rustc needs an escape analysis pass to remove some useless heap allocations!):

#![feature(step_by)]

extern crate rayon;
extern crate crypto;

type T = u32;
type Tout = [u8; 16];

fn solve<F: Sync + Fn(&Tout) -> bool>(inf: T, sup: T, test: F) -> Option<T> {
    use std::io::Write;
    use crypto::md5::Md5;
    use crypto::digest::Digest;
    use rayon::par_iter::*;

    // KEY from:
    // https://github.com/theburningmonk/AdventOfCodeFs/blob/master/src/AdventOfCode2015/AdventOfCode2015/Day04.fsx
    const KEY: &'static [u8] = b"bgvyzdsv";

    let m = (inf .. sup)
            .into_par_iter()
            .filter(|i| {
                let mut hasher = Md5::new();
                hasher.input(KEY);
                const BUF_LEN: usize = 20;
                let mut num_buf = [0u8; BUF_LEN];
                let num_len = {
                    let mut cursor = &mut num_buf[..];
                    write!(&mut cursor, "{}", i).unwrap();
                    BUF_LEN - cursor.len()
                };
                let num = &num_buf[..num_len];
                hasher.input(num);
                let mut output = Tout::default();
                hasher.result(&mut output);
                test(&output)
            })
            .min(); // Doesn't return an Option<> ?

    if m == T::max_value() { None } else { Some(m) }
}

fn main() {
    let step: T = 40_000; // Found by trial and error on my CPU.

    for i in (0 .. T::max_value()).step_by(step) {
        if let Some(sol) = solve(i, i + step, |o| (o[0] | o[1] | (o[2] >> 4)) == 0) {
            println!("{}", sol);
            break;
        }
    }

    for i in (0 .. T::max_value()).step_by(step) {
        if let Some(sol) = solve(i, i + step, |o| (o[0] | o[1] | o[2]) == 0) {
            println!("{}", sol);
            break;
        }
    }
}
1 Like

I'm still mostly learning/experimenting with Rust, so most times I use Rust nightly.

I suspect this is not a good idea (and I think Rayon into_par_iter() doesn't have a step_by()).

This is because of the short-circuiting nature of the problem. I opened a feature request for rayon earlier today to ask for a find() method on parallel iterators, which would make this problem much easier to express.

1 Like

You should count the exact max size needed for the buffer. I can assure you I did no such thing for the example and just guessed at 20. :smile:

Edit: Oh crap, the max is 20 for u64 in decimal.

Yes, using an ArrayVec helps mitigate mistakes in such things.

It's nice that it is faster. I think the integer to decimal string thing can be improved further, by not going through the formatting machinery (but I don't know if anything like that is exposed).

Yes, I suppose that's a prerequisite for implementing Read and / or Write — the read/write value must have an advancing cursor. And the slices do, they are just updated to point to the part that's not yet read / written.

Andrei in talks like this shows ways to make faster integral number to string conversions:

Oh yes you're right, sorry!

Indeed.

Also, don't forget to take into consideration the task weight(how much time does it take to complete one of the tasks).
Currently rayon uses a small weight(it is configurable) which might not be optimal for your use case.
This might change in the near future.