Which is more idiomatic? Functional, imperative or a mix?


#4

Thank you very much for your thoughts! I only ‘found’ the ‘Iterator over Results -> collect into a Result’ feature by banging my head against the compiler :slight_smile: for a while! Google, however, came to the rescue, and I saw it in some code and thought, “oh neat!”.

I wondered about using a let and ? just on the digits.

Performance-wise, I did benchmark it; the harder-to-read mutable/imperative type is about 50% faster than the functional one, but that might be solely because the functional one constructs a second list, whereas the other just iterates over digits; I’ve blogged about it here which I hope it’s not a faux pas in blogging and asking the question here.


#5

That is much neater. I didn’t realise you could ? in the middle of the expression, nor that windows(...) was defined on the result (or does the ? in the middle do the unwrap() if it’s an Ok(...))?

Also, unwrap_or() is new to me, but I don’t see how it is working?
Thanks very much!


#6

unwrap_or(y) will return x if it is called on a Some(x) [or Ok(x) for the Result version] or y if it is called on a None [or Err(_)]


#7

In the itertools crate there is a chunks() ( https://docs.rs/itertools/0.6.0/itertools/trait.Itertools.html#method.chunks ) that works on iterables, but unlike windows() it also returns partially filled windows (why? I don’t like this), and its usage is a bit of a mess (because of borrowing issues).


#8

In my opinion, the answer is almost always mixed is idiomatic. I would not go in one extreme or the other. For instance, in this problem I would probably do something like this:

pub fn lsp(digits: &str, size: u32) -> Result<u32, Error> {
    if size == 0 {
        return Ok(1);
    }

    if digits.len() < size as usize {
        return Err(Error::Window);
    }

    // We are only concerned with digits, so treat as ASCII.
    // Non-ASCII will be caught when converting to digits.
    let windows = digits.as_bytes().windows(size as usize);
    let mut max = 0;

    // If we take a `map` approach, we find it difficult to return
    // early on errors, so a simple for loop might be better for this
    for window in windows {
        let mut product = 1;
        for &byte in window {
            product *= (byte as char).to_digit(10).ok_or(Error::Digit)?;
        }

        if product > max {
            max = product
        }
    }

    Ok(max)
}

This example performs nearly identical to the imperial example, but you can still get some nice abstractions with using things like windows. I know that this is not identical to the way you wrote it (though, it is functionally identical), but this is what I would consider idiomatic in my humble opinion. And personally, I believe this reads better than a functional approach (but that’s certainly subjective).


#9

Interesting. I kind of wondered if ‘mixed’ was the idiomatic approach as both approaches are supported by the language; it could’ve been two competing groups, or a single group who recognised that both have a role to play. Thanks for taking the time to write out the code; it’s given me another perspective on how Rust is used.


#10

chunks doesn’t do overlapping subsequences. In itertools you have only tuple_windows, which is not useful when you need varying window size.


#11

One optimization you can make is to iterate over bytes instead of chars. The Bytes iterator implements ExactSizeIterator so the collect call will be slightly faster. Also, iterating over bytes is always slightly faster than iterating by over characters but not by that much.

You might consider compiling with -C opt-level=3 instead of -O (opt level 2) but this doesn’t make much (if any) difference if you’ve already applied the first optimization.

On my machine, this reduces the imperative version’s lead to 20%.


#12

Actually, benchmarking is hard… With judicious use of test::black_box to prevent the compiler from optimizing away too much code, the functional version is at least 50% faster:

#![feature(test)]
extern crate test;

use test::Bencher;

#[derive(Debug)]
pub enum Error {
    Digit,
    Window,
}


pub fn lsp_func(digits: &str, size: u32) -> Result<u32, Error> {
    if size == 0 { return Ok(1) };
    digits.bytes()
        .map(|c| (c as char).to_digit(10).ok_or(Error::Digit))
        .collect::<Result<Vec<u32>, _>>()?
        .windows(size as usize)
        .map(|w| w.iter().product())
        .max()
        .ok_or(Error::Window)
}

pub fn lsp_imp(digits: &str, size: u32) -> Result<u32, Error> {
    if size == 0 { return Ok(1) };
    let mut prod = vec![0u32; size as usize];
    let mut pointer: usize = 0;
    let mut max_value: u32 = 0;
    let mut count: u32 = 0;
    for c in digits.chars() {
        count += 1;
        prod[pointer] = c.to_digit(10).ok_or(Error::Digit)?;
        pointer = (pointer + 1) % (size as usize);
        let mut sum: u32 = 1;
        for p in 0..size as usize{
            sum *= prod[p];
        }
        if sum > max_value { max_value = sum; }
    }
    if count < size { return Err(Error::Window); }
    Ok(max_value)
}

const DIGITS: &'static str = "123456780129384721984310238409213840192834098721395641024301239847109328741928734091827430918732478465764646491928374";

#[bench]
fn bench_func(b: &mut Bencher) {
    b.iter(|| {
        test::black_box(lsp_func(test::black_box(DIGITS), test::black_box(10)).unwrap());
    });
}


#[bench]
fn bench_imp(b: &mut Bencher) {
    b.iter(|| {
        test::black_box(lsp_imp(test::black_box(DIGITS), test::black_box(10)).unwrap());
    });
}

#13

interesting, i measured that it’s 2x as slow on windows of length 3 and 50% faster on windows of length 10 for that snippet.


#14

Did you remember to optimize (rustc -O)? The functional variant is pretty consistently faster on my machine:

rustc -O --test the_test.rs
./the_test --bench

#15

@cbreeden, @stebalien - wow, much better bench-marking than I did! I know that the imperative version can be improved:

  1. It calculates the first windows-size - 1 products which will all be 0, so aren’t needed.
  2. It calculates the whole product again; a combination of a remembering the product and dividing by the number about to be dropped, and then multiplying by the new one would give it a constant performance regardless of window-size - but it would then need to handle divide by zero, etc.

It’s curious that the functional version is faster! Is v[x] when v is a vector, an expensive op? I thought it would compile down to a simple offset.

I never tried it with rustc -O ...; I will do so.


#16

I was simply using cargo --bench, but I still get the same numbers with your method – imperial is bad for small windows, good for large windows. It’s interesting, how the relation between the size of the input and the window sizes impact the benches. The mix method does pretty good for most small windows, but as you increase the window sizes the imperial method starts to win out (on my machine, for large inputs when the windows >= ~11). Is it possible that it could be vectorizing the product? I doubt it would be able to vectorize the mix variant I gave due to where the error handling is taking place but I also wasn’t sure if you could vectorize when the length of the windows were unknown.


#17

If you compiled with cargo --release, your code should have been optimized.

There’s usually some bounds-checking overhead but that’s not the issue here (I’ve tried using unsafe methods to remove the bounds checking and it doesn’t make much of a difference). I don’t know why the functional version is faster but I can guess why test::black_box affects the imperative version so much: it optimizes better so preventing optimizations will hurt it more.


#18

With the two versions I presented, as window size increased, they converged to roughly the same time (window size ~ 20), which implies the product calculation starts to affect the timings. At small window sizes, I’m guessing the small extra overhead (for millions of iterations) starts to impact it. But for the convenience, it’s a very small price to pay. If performance is really an issue, I guess this sort of hand optimisation in individual, profiled, functions is the way to go.

I looked up the docs on cargo build --release, and, as you said, it’s already optimised. Rust compiled code is very fast!

Thanks again for your thoughts and help!


#19

On further testing, my version overflows. Using u64 instead of u32 works better but doesn’t appear to change the performance characteristics.

I also tried optimizing the functional version by keeping a running product but that version is significantly slower:

pub fn lsp_func2(digits: &str, size: usize) -> Result<u64, Error> {
    if size == 0 { return Ok(1) };
    let digits = digits
        .bytes().map(|c| (c as char).to_digit(10).ok_or(Error::Digit))
        .collect::<Result<Vec<u32>, _>>()?;

    if digits.len() < size {
        return Err(Error::Window);
    }
    
    let mut value: u64 = 1;
    let mut zeros = size;

    Ok(iter::repeat(0)
       .take(size)
       .chain(digits.iter().cloned())
       .zip(digits.iter().cloned())
       .map(|(leave, enter)| {
           if leave == 0 { zeros -= 1 } else { value = value/(leave as u64) }
           if enter == 0 { zeros += 1 } else { value = value*(enter as u64) }
           if zeros == 0 { value } else { 0 }
       })
       .max()
       .unwrap())
}

#20

Long story short: Idomatic rust is the rust that runs fastest :stuck_out_tongue:


#21

For me, idiomatic Rust code (and also for other programming languages) is the shortest code, measured by the number of language tokens. Here, your functional version has 97 tokens in the body of the function, while the imperative version has 151 tokens. So the functional version is more idiomatic.


#22

Many good points. Therefore added this discussion to idiomatic-rust on Github.


#23

That looks like a useful resource (at least for me!). Thanks for mentioning here!