Enumerate very slow?

I agree it looks like line noise when written

PR.iter().enumerate().skip(i+1).take_while(|(_,p)| *p < &pmax).for_each(|(j,p)| );

That's why I would write it as

PR.iter()
     .enumerate()
     .skip(i+1)
     .take_while(|(_,p)| {
         *p < &pmax).for_each(|(j,p)| 
      });

My brain has an easier time grokking the statement piece by piece when formatted this way.

4 Likes

@alanhkarp -- is that correct code to include the for_each in the take_while closure? I think your parentheses and curly braces are off.

But I agree with the comment about formatting each method call on a separate line being more readable.

1 Like

You're right. My bad .

PR.iter()
     .enumerate()
     .skip(i+1)
     .take_while(|(_,p)| {
         *p < &pmax
      })
     .for_each(|(j,p)|);

Did I get it right this time?

Itertools::dropping is a nice alternative to skip, as it eagerly consumes items and returns the original iterator, with no further performance impact.

3 Likes

Despite what I said about "line noise" above, and given that people have put effort into the solution I plugged it into my program to see what would happen.

Luckily cargo-fmt arranges it into something a normal human might be able to read:

        PR.iter()
            .enumerate()
            .skip(i + 1)
            .take_while(|(_, p)| **p < PR.len() as i64)
            .for_each(|(j, p)| {
                self.factors.p[fmax] = *p;
                self.s = s * p;
                self.i = j;
                self.fmax = fmax;
                self.work();
            });

I might return to the mind numbing complexity of all this later but for now the bottom line is that it is still slow. At 4 minutes 57 seconds.

Did you really mean to change your take_while condition from pmax?

Another option avoiding skip is to offset the enumeration yourself:

for (j, &p) in PR[i + 1..].iter().enumerate() {
    let j = j + i + 1;
    // ...
}

Just to clarify something. Are you benchmarking this in release or debug mode? Running in release might eliminate some of the slowness.

Sorry. Well spotted. Actually the timing I gave above is using 'pmax'. Makes no difference.

Your suggestion looks nice. The only worry being that it repeats that 'i + 1' calculation. This violating the famous DRY principle.

Sadly it is also a bit slow at 4 minutes 53 seconds.

$ time cargo build --release
   Compiling tatami_rust v0.1.0 (/mnt/c/Users/zicog/tatami-rust)
    Finished release [optimized] target(s) in 2.40s

real    0m2.456s
user    0m0.906s
sys     0m1.859s
$ time target/release/tatami_rust
T(63405342000)=1000

real    4m42.222s
user    4m42.000s
sys     0m0.031s

I know using 'time' is rude and crude but with such big differences I feel it's sufficient.

1 Like

I like the functional style when it comes to a chain like map.filter.map.fold.

When you need access to the index itself, it starts to look more like a case for just using a for loop.

I come from years of high-level languages with lots of the functional style, but I still find the imperative style easier to read in this case. Unless you absolutely need this to be an iterator, such as if you're passing that iterator between functions or something like that, I see no reason not to stick with the for loop. It's your code.

I think it's admirable that the compiler developers have managed to make iterators as efficient as hand-coded, imperative loops, but that's not to say that iterators must always be used in place of imperative loops.

Let's say you had a vector full of indexes into another vector. Maybe you can contrive a functional-style way of building an iterator for that using a filter, but the for loop would be immediately clear and optimal. The filter might come out as fast due to optimizations, but it might not.

Here you have a case where the transformation to the functional style is measurably slower and it's noisier to you. I'd just leave it.

3 Likes

How about a hybrid approach?

for (&p, j) in PR.iter().zip(i + 1..PNUM) {
    if p > pmax {
        break;
    }
    self.factors.p[fmax] = p;
    self.s = s * p;
    self.i = j;
    self.fmax = fmax;
    self.work();
}

I'm not sure it's faster, but maybe it's worth a try? It doesn't use explicit indexing and it's not horribly unreadable, either.

3 Likes

You could do away with indexing PR at all by passing p and the remaining prime iterator as parameters. Actually most of your fields seem to be parameters of a sort, which you're manually resetting before/after recursive calls, which might not be worth avoiding parameters. Anyway, like this:

diff --git a/src/prune.rs b/src/prune.rs
index 89efdfc45104..584662f55356 100755
--- a/src/prune.rs
+++ b/src/prune.rs
@@ -1,4 +1,5 @@
 use crate::constants::{FNUM, PNUM, SMAX};
+use std::slice::Iter;
 
 include!(concat!(env!("OUT_DIR"), "/prime.rs"));
 
@@ -22,7 +23,6 @@ pub struct Tatami {
     smin: i64,
     z: Vec<i64>,
     s: i64,
-    i: usize,
     fmax: usize,
 }
 
@@ -34,7 +34,6 @@ impl Tatami {
             smin: SMAX,
             z: vec![0; FNUM],
             s: 2,
-            i: 0,
             fmax: 0,
         }
     }
@@ -91,7 +90,7 @@ impl Tatami {
         r
     }
 
-    fn work(&mut self) {
+    fn work(&mut self, p: i64, mut pr: Iter<'_, i64>) {
         let s = self.s;
         let mut r = self.sigma();
         if r >= self.isn {
@@ -100,33 +99,25 @@ impl Tatami {
                 self.smin = s;
             }
         }
-        let i = self.i;
         let mut fmax = self.fmax;
         let pmax = self.smin / s + 1;
-        let mut p = PR[i];
         if p <= pmax {
             self.factors.n[fmax] += 1;
             self.s = s * p;
-            self.work();
+            self.work(p, pr.clone());
             self.factors.n[fmax] -= 1;
         }
         fmax += 1;
         self.factors.n[fmax] = 1;
 
-        // Clippy said do this, which is not only ugly but much slower.
-        // for (j, p) in PR.iter().enumerate().take(PNUM).skip(i + 1) {
-        // for (j, p) in PR.iter().enumerate().skip(i + 1) {
-        #[allow(clippy::needless_range_loop)]
-        for j in i + 1..PNUM {
-            p = PR[j];
+        while let Some(&p) = pr.next() {
             if p > pmax {
                 break;
             }
             self.factors.p[fmax] = p;
             self.s = s * p;
-            self.i = j;
             self.fmax = fmax;
-            self.work();
+            self.work(p, pr.clone());
         }
         self.factors.n[fmax] = 0;
     }
@@ -136,7 +127,7 @@ impl Tatami {
         self.factors = Factors::new(FNUM);
         self.factors.p[0] = PR[0];
         self.factors.n[0] = 1;
-        self.work();
+        self.work(PR[0], PR[1..].iter());
         if self.smin < SMAX {
             self.smin
         } else {

Note, the pr.clone() is cheap, just copying start/end pointers.

This gave a slight improvement on my machine, from 269s to 265s.

1 Like

Wow! I think that loop you are proposing is the most beautiful line of Rust I have ever seen:

    while let Some(&p) = pr.next() {

So clear. Not only short and sweet, together with your other changes it removes a load of cruft from the code. I will riff on that idea a bit more and see what else I can tidy up.

Even a die hard C programmer could understand it. It's basically something like.

    while (++p != endp) {

but looks like an English major would get it.

Just goes to show that when converting C code to Rust one should stand back and look at the whole thing. Not get bogged down in trying to reproduce all the C ways.

Sadly it runs a bit slower on my PC. Down to 4m48 from 4m42. The original C is 4m43.

Meanwhile on the ARM of a Raspberry Pi we are down to 24m38s where the original C is 21m55s

We are of course lost in the noise with these crude timing measurements so this solution is a keeper

Very much thanks for that. Quite enlightening.

1 Like

But isn't while let Some(&p) = pr.next() just equivalent to for &p in pr?

No, the for loop is equivalent to

let __iter = IntoIterator::into_iter(pr);

while let Some(&p) = __iter.next() { ... }

Which moves pr making it inaccessible in the loop.

1 Like

The code on GitHub is already changed (and I'm on mobile right now so I can't be bothered to go through all previous commits), but I assumed pr was a slice, in which case it's Copy. But if it's something that isn't Copy, then you're right.

As per @cuviper's diff, it is std::slice::Iter<'_, i64>, which is not Copy

1 Like

In my diff above, I need to clone the current state of the iterator as the loop progresses. You can't access the iterator at all in a for loop.

pr is a static array:

pub static PR: [i64; 40_000] =...

The 'work' function gets called with a

mut pr: Iter<'_, i64>

parameter. Which it then clones and passes to itself, twice, recursively.

The original solution in C is my E.J.Olson from the University of Nevada. I believe he wrote in BASIC first. He is a tough cookie to beat in these challenges I found.
https://www.raspberrypi.org/forums/viewtopic.php?f=31&t=257317

Previously I managed to match him in finding all possible anagrams in the English dictionary in Rust.

At this point I start to see why people sometimes complain about the Rust documentation.

I thought I'd better dig deeper into cuviper's solution. It all starts with calling iter() on an array and ends with calling next() on the iterator. Which is clear enough. But what is that parameter "mut pr: Iter<'_, i36>" all about? Where did that come from?

A quick google gets me to the 'std::iter' page:
std::iter - Rust where there is lots of nice discussion but no hint about what the iter() function returns. Which is our paramameter type.

Eventually it dawns on me this page is about the iter module and that I should hit "Iterator" to find out about the trait. No clue on that page.

Eventually I find in the left hand index on the page there is a heading "Implementations on Foreign Types". But of course, Itererator is a trait and it's implimented for my array.

Ah ha, in that list is "Iter<'a, K>" and I finally find the definition:

impl<'a, K> Iterator for Iter<'a, K>

That looks like my parameter. Some lifetime and a type, i64 in my case.

I would not say the documentation is bad. It all looks very good. And there is loads of it. Just seems to be in the nature of things to be hard work to find what you are looking at in the source.

It may still take me some time to figure out how it all hangs together.

1 Like