Iterators or loops?

Is it possible and/or necessary to convert the following code to a version using iterators? The code should take an integer and factorize it into its prime factors. I also implemented a version using an incremental sieve, but this algorithm came out about an order of magnitude faster. (9 ms vs 130 ms).

(Bad) algorithm explanation

The algorithm works by starting at two, dividing the input by the current prime candidate until the answer isn't divisible any more, then goes to the next odd number. This way, even if the next odd number isn't prime, it will just immediately be filtered out since the remaining number had been divided by that composite factor until it can't be divided any more via division by its prime factors.

The reason to use iterators would be to be able to know the size of the Vec before starting to write into it, thus saving reallocations.

pub fn factors(n: u64) -> Vec<u64> {
    let mut factors = vec![];
    let mut factor = 2;
    let mut n = n;
    while n > 1 {
        while n % factor == 0 {
            factors.push(factor);
            n /= factor;
        }
        factor = if factor == 2 { 3 } else { factor + 2 };
    }
    factors
}

(Playground)

If you can't do it imperatively, you can't do it with iterators. I highly doubt this will work out; you'll need something like a flat_map or filter somewhere that causes you to lose the guarantees of ExactSizeIterator.

1 Like

This algorithm, in its iterative form, is the most natural and thus readable one (an iterator-based approach will have to keep a state and iterate as long as that state meets some condition, which is a pattern that does not read well).

AFAIK, there is no way to know in advance the number of prime factors of a number, so a conversion to a function-style iterator chain would not change this.

We can, however, compute in O(1) an upper bound on this number of prime factors: floor(ln(n) / ln(2)), or in other words:

u64::trailing_zeros(
    if n.is_power_of_two() { n } else { n.next_power_of_two() >> 1 }
)
pub
fn factors (mut n: u64) -> Vec<u64>
{
    let mut factors = Vec::with_capacity(
        if n.is_power_of_two() {
            return vec![2; n.trailing_zeros() as usize];
        } else {
            (n.next_power_of_two().trailing_zeros() as usize) - 1
        }
    );
    let mut factor = 2;
    while factor * factor <= n {
        while n % factor == 0 {
            factors.push(factor);
            n /= factor;
        }
        factor = if factor == 2 { 3 } else { factor + 2 };
    }
    if n > 1 {
        factors.push(n);
    }
    factors
}
9 Likes

Thanks for the detailed answer! I don't understand how you came from floor(ln(n) / ln(2)) to the power of two and trailing zeros stuff, but thanks anyway! I am not super advanced in math (stopped at the end of school), but I did understand school math well, so if you can quickly explain or give a link, I might be able to follow. Unfortunately, the links I found after a cursory search were assuming a lot more math knowledge than I have :frowning:

1 Like

ln(n) / ln(2) == log_2(n), and the number of trailing zeroes in the binary representation of an integer that is itself a power of two is also exactly the base-2 log of that integer (e.g. in binary, 2^0 = 1, 2^1 = 10, 2^2 = 100, etc.)

The next_power_of_two bit comes from being conservative, i.e. when we do not have an exact power of two (and thus the number of trailing zeroes wouldn't be indicative of the log of the input number), we round up to the next power of two, in order to maintain an upper bound. (The subsequent right shift by one actually halves this value, yielding a round-down, which in turn corresponds to the floor part of the formula.)

2 Likes

Thanks! Makes sense.

So now the question is: does it make sense to always allocate the upper bound and have potentially 63 * 8 bytes unused for a 64 bit prime or will it be better just to push as factors come in? Performance isn't a problem any more; both the incremental sieve and the posted algorithm are taking less than about 1,5 s to calculate the prime factors of a number of which 894_119 is a prime factor. (Much better than the ~5 minutes trial division took...) So I just wanted to know if there is a way to allocate upfront. So the performance of the pre-allocating version should be a bit faster, but is it worth it?

It's nearly always possible, using techniques like those mentioned in Can a while loop be rewritten using iterators? - #4 by scottmcm

But it's also almost never necessary to convert something to iterators. As pointed out, the size_hint isn't magic -- and indeed sometimes (https://github.com/rust-lang/rust/pull/48597) it's wrong so you want to do it manually even with iterators.

There's no need to pre-allocate the upper bound, because the upper bound is usually rare. Copying a small Vec (63x usize is not big at all) is not a big deal, and the growth strategy for Vec ensures that push is amortized O(1) so meh. Just pick a typical value and call it good enough.

1 Like

@H2CO3 already presented the math behind log₂(n) and .trailing_zeros(), I will just prove my initial premise:

  • in the same vein that - is the reciprocal of +, / is the reciprocal of ×, and log₂ is the reciprocal of f: n ↦ 2ⁿ:

    • a + ? = b ⇔ ? = b - a,

    • a × ? = b ⇔ ? = b / a,

      • (when a ≠ 0)
    • 2 ^ ? = b ⇔ ? = log₂(b)

      • (And more generally, a ^ ? = b ⇔ ? = ln(b) / ln(a), when a ≠ 1)

Now, we were talking about an upper bound on the number of prime factors (with repetition) of N. If we call n the number of such factors, we have:

P1 × P2 × … × Pₙ = N

And since for each factor Pₖ, we have 2 ≤ Pₖ, we then have:

2 × 2 × … × 2 ≤ P1 × P2 × … × Pₙ,

i.e.:

2ⁿ ≤ N

hence, (since log₂ is monotonically increasing):

n ≤ log₂(N)

which, thanks to i being an integer, leads to

i ≤ floor(log₂(N))

i.e.:

For any N ≥ 1, the number of factors of N is ≤ floor(log₂(N))

Q.E.D.


I had written log₂(N) as ln(N) / ln(2) because I was too lazy to look for a subscript 2. Today, on the other hand, the unicode wealth is real :stuck_out_tongue:

4 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.