Why is it so slow? Java beats Rust? What's more, Rust doesn't beat Python too much

Here's an iterative version that works on nightly (requires const generics). I don't know how its performance compares, but the playground finishes it in 1.73 seconds for me:

EDIT: There are some edge-case bugs here, see @steffahn's post below for a corrected version.

struct CellIter<const N: usize> {
    start: [u8; N],
    end: [u8; N],
    pending: [u8; N],
    done: bool,
}

impl<const N: usize> Iterator for CellIter<N> {
    type Item = [u8; N];
    fn next(&mut self) -> Option<[u8; N]> {
        if self.done {
            return None;
        }
        let result = self.pending;
        for i in (0..N).rev() {
            self.pending[i] += 1;
            if self.pending[i] < self.end[i] {
                return Some(result);
            }
            self.pending[i] = self.start[i];
        }
        self.done = true;
        Some(result)
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
        let final_size: usize = self
            .start
            .iter()
            .copied()
            .zip(self.end.iter().copied())
            .map(|(s, e)| e as usize - s as usize)
            .fold(1, std::ops::Mul::mul);

        (final_size, Some(final_size))
    }
}

impl<const N: usize> ExactSizeIterator for CellIter<N> {}

fn calc_cells<const N: usize>(start: [u8; N], end: [u8; N]) -> CellIter<N> {
    CellIter {
        start,
        end,
        pending: start,
        done: false,
    }
}

fn main() {
    let start = [0, 0, 0, 0];
    let end = [127, 127, 127, 29];
    let all_cells = calc_cells(start, end).collect::<Vec<_>>();
    dbg!(all_cells.len());
}

(Playground)

2 Likes

Yeah, great thanks to @raidwas it runs much faster, 14328 milliseconds this time, but still a little bit slower than Java.

Wow, I'll try it out now, maybe it's lightening fast!

This code clearly is allocation heavy. @2e71828's code optimizes it by eliminate most heap allocations except the outmost one, but it uses a feature not available in current stable compiler. This feature is called the min_const_generics, which is planned to be included in the stable compiler in march.

Another optimization you can consider using the current stable compiler is to improve your allocator. By default the Rust uses your system's default malloc/free implementation but it may not be the world's most performant allocator implementation. You can replace it with the jemalloc using the jemallocator crate.

3 Likes

Well, there's 1 heap allocation in the collect::<Vec<_>> call.

I thought I fixed it before anyone noticed it. But you win :joy:

Isn’t there a difference in the end bound? OP’s code seems to use the end bounds inclusive while your version seems to exclude them.

Or am I misinterpreting the code here....

1 Like

It's likely; I admit I didn't study the original code too closely.

Yeah, calling with


    let start=[0, 0,0 ];
    let end=[127, 127, 29];

your’s has length 467741, while the original has length 491520.

It's an easy fix; here's the updated playground. I'll edit my solution above in a minute.

Doesn’t work with e.g.

    let start = [0, 0];
    let end = [255, 255];
1 Like

Your optimization is amazing! It only takes around 550 milliseconds now, great thanks to you, :grinning:

1 Like

Here’s a version that’s actually working:

struct CellIter<const N: usize> {
    start: [u8; N],
    end: [u8; N],
    pending: [u8; N],
    done: bool,
}

impl<const N: usize> Iterator for CellIter<N> {
    type Item = [u8; N];
    fn next(&mut self) -> Option<[u8; N]> {
        (!self.done).then(|| {
            let result = self.pending;
            self.done = (0..N).rev().all(|i| {
                if self.pending[i] < self.end[i] {
                    self.pending[i] += 1;
                    false
                } else {
                    self.pending[i] = self.start[i];
                    true
                }
            });
            result
        })
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
        if self.done {
            (0, Some(0))
        } else {
            let mut x = 1_usize;
            let mut r = 1_usize;
            for i in (0..N).rev() {
                r += (self.end[i] - self.pending[i]) as usize * x;
                x *= (self.end[i] - self.start[i]) as usize + 1;
            }
            (r, Some(r))
        }
    }
}

fn calc_cells<const N: usize>(start: [u8; N], end: [u8; N]) -> CellIter<N> {
    CellIter {
        start,
        end,
        pending: start,
        done: start.iter().zip(end.iter()).any(|(s, e)| s > e),
    }
}

fn main() {
    let start = [0, 0, 2];
    let end = [44, 255, 42];
    let all_cells = calc_cells(start, end).collect::<Vec<_>>();
    let mut l = dbg!(all_cells.len());
    let mut iter = calc_cells(start, end);
    assert_eq!(iter.size_hint().0, l);
    while let Some(_) = iter.next() {
        l -= 1;
        assert_eq!(iter.size_hint().0, l);
    }
    assert_eq!(l, 0);
}

(playground)

Edit: Ah, good point with the start > end. That doesn’t work in my version above.....
Edit2: Fixed that too.. lower bound for size_hint is still wrong though...
Edit3: Fixed the size_hint.

3 Likes

Ok; that's fixed, as is the start > end case: New playground.

As long as the two points are opposite corners of an axis-aligned bounding hyperbox, it should enumerate all the contained points (but not necessarily from start to end).

Edit: Prefer @steffahn's version above to this one; it's more complete.

Just to point out, there are still performance improvements even without completely reworking the code. For example, replacing the line

let mut next_gen_cells:Vec<Vec<u8>> = Vec::new();

with

let mut next_gen_cells: Vec<Vec<u8>> =
    Vec::with_capacity(last_cells.len() * (usize::from(end[dim]) + 1));

shaves 20% of the time off on my machine. (Note that cell[dim] in the inner loop is always 0)

4 Likes

Two problems popped up:

  1. The number of cells isn't correct, it should be 62914560 instead of 59403107.

  2. The dimension size comes from JSON file, only known at runtime, I need to somehow use dynamic array, I tried below:

    fn demo<T, const N: usize>(v: Vec) -> [T; N] {
    let boxed_slice = v.into_boxed_slice();
    let boxed_array: Box<[T; N]> = match boxed_slice.try_into() {
    Ok(ba) => ba,
    Err(o) => panic!("Expected a Vec of length {} but it was {}", N, o.len()),
    };
    *boxed_array
    }

but it doesn't work, Rust says: 'cannot infer the value of const parameter N declared on the function demo'.

In this case, const generics are not an option anymore. I’d be curious how fast my code translated back into a Vec-generating iterator performs, compared to your original one, or the improved version with the with_capacity suggested above

struct CellIter {
    start: Vec<u8>,
    end: Vec<u8>,
    pending: Vec<u8>,
    done: bool,
}

impl Iterator for CellIter {
    type Item = Vec<u8>;
    fn next(&mut self) -> Option<Vec<u8>> {
        (!self.done).then(|| {
            let result = self.pending.clone();
            self.done = (0..self.pending.len()).rev().all(|i| {
                if self.pending[i] < self.end[i] {
                    self.pending[i] += 1;
                    false
                } else {
                    self.pending[i] = self.start[i];
                    true
                }
            });
            result
        })
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
        if self.done {
            (0, Some(0))
        } else {
            let mut x = 1_usize;
            let mut r = 1_usize;
            for i in (0..self.pending.len()).rev() {
                r += (self.end[i] - self.pending[i]) as usize * x;
                x *= (self.end[i] - self.start[i]) as usize + 1;
            }
            (r, Some(r))
        }
    }
}

impl ExactSizeIterator for CellIter {}

fn calc_cells(start: Vec<u8>, end: Vec<u8>) -> CellIter {
    assert_eq!(start.len(), end.len());
    CellIter {
        done: start.iter().zip(end.iter()).any(|(s, e)| s > e),
        pending: start.clone(),
        start,
        end,
    }
}

fn main() {
    let start = vec![0, 0, 2];
    let end = vec![44, 255, 42];
    let all_cells = calc_cells(start.clone(), end.clone()).collect::<Vec<_>>();
    let mut l = dbg!(all_cells.len());
    let mut iter = calc_cells(start, end);
    assert_eq!(iter.size_hint().0, l);
    while let Some(_) = iter.next() {
        l -= 1;
        assert_eq!(iter.size_hint().0, l);
    }
    assert_eq!(l, 0);
}

(playground)

2 Likes

My original version used half-open bounds instead of inclusive bounds; perhaps that's the difference.

Still much faster than mine, great, :smiley:

Along similar lines, here's a version that uses the smallvec crate for the coordinates. If the dimensionality is small enough (<= 8 as written, but tunable), there won't be heap allocations.

use smallvec::SmallVec; // 1.5.1

type Coord = SmallVec<[u8;8]>;

struct CellIter {
    start: Coord,
    end: Coord,
    pending: Coord,
    done: bool,
}

impl Iterator for CellIter {
    type Item = Coord;
    fn next(&mut self) -> Option<Coord> {
        (!self.done).then(|| {
            let result = self.pending.clone();
            self.done = (0..self.pending.len()).rev().all(|i| {
                if self.pending[i] < self.end[i] {
                    self.pending[i] += 1;
                    false
                } else {
                    self.pending[i] = self.start[i];
                    true
                }
            });
            result
        })
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
        if self.done {
            (0, Some(0))
        } else {
            let mut x = 1_usize;
            let mut r = 1_usize;
            for i in (0..self.pending.len()).rev() {
                r += (self.end[i] - self.pending[i]) as usize * x;
                x *= (self.end[i] - self.start[i]) as usize + 1;
            }
            (r, Some(r))
        }
    }
}

impl ExactSizeIterator for CellIter {}

fn calc_cells(start: Coord, end: Coord) -> CellIter {
    assert_eq!(start.len(), end.len());
    CellIter {
        done: start.iter().zip(end.iter()).any(|(s, e)| s > e),
        pending: start.clone(),
        start,
        end,
    }
}

fn main() {
    let start = SmallVec::from_vec(vec![0, 0, 0]);
    let end:Coord = SmallVec::from_vec(vec![127,127,29]);
    let all_cells = calc_cells(start.clone(), end.clone()).collect::<Vec<_>>();
    let mut l = dbg!(all_cells.len());
    let mut iter = calc_cells(start, end);
    assert_eq!(iter.size_hint().0, l);
    while let Some(_) = iter.next() {
        l -= 1;
        assert_eq!(iter.size_hint().0, l);
    }
    assert_eq!(l, 0);
}

(Rust Playground)

4 Likes