Why is it so slow? Java beats Rust? What's more, Rust doesn't beat Python too much

fn calc_cells_recursively(last_cells:Vec<Vec<u8>>, dim:usize, dimensions:usize, end:&Vec<u8>)->Vec<Vec<u8>>{
     if dim == dimensions {
        return last_cells;
    }
    let mut next_gen_cells:Vec<Vec<u8>> = Vec::new();        
    for cell in last_cells.iter() {
        for v in cell[dim]..=end[dim]{
            let mut new_cell=cell.clone();
            new_cell[dim] = v;
            next_gen_cells.push(new_cell);
        }
    }
    return calc_cells_recursively(next_gen_cells, dim + 1, dimensions, end);
}

fn calc_cells(start:Vec<u8>, end:Vec<u8>)->Vec<Vec<u8>>{
    let mut last_cells:Vec<Vec<u8>>=Vec::new();
    let len=start.len();
    last_cells.push(start);
    return calc_cells_recursively(last_cells, 0usize, len, &end);
}

fn main(){
    let start=vec![0, 0, 0,0 ];
    let end=vec![127, 127, 127, 29];
    let all_cells=calc_cells(start, end);
}

The codes above are just to calculate all points in N dimensional space ranging from 'start' to 'end' in a recursive way. It takes 92083 milliseconds, but Java only takes about 11055 milliseconds, and Python takes 124420 milliseconds, by following the same logics with the same input in the same environment. The perf gap is not trivial at all between Rust and Java, what's more, Rust doesn't beat Python too much. I guess it would be faster if pure arrays could be used, but Rust needs to determine a static array size in compiling time, so could anyone help me to optimize these codes to speed it up significantly?, please. I really hope Rust can beat Java on this specific problem, just because I'm not experienced enough in Rust.

Did you use the release flag? Otherwise rustc does not optimize the code (or to be more precise: llvm does not optimize the code).
cargo run --release

1 Like

Thanks, I forgot this, I'll try it now.

Here's an iterative version that works on nightly (requires const generics). I don't know how its performance compares, but the playground finishes it in 1.73 seconds for me:

EDIT: There are some edge-case bugs here, see @steffahn's post below for a corrected version.

struct CellIter<const N: usize> {
    start: [u8; N],
    end: [u8; N],
    pending: [u8; N],
    done: bool,
}

impl<const N: usize> Iterator for CellIter<N> {
    type Item = [u8; N];
    fn next(&mut self) -> Option<[u8; N]> {
        if self.done {
            return None;
        }
        let result = self.pending;
        for i in (0..N).rev() {
            self.pending[i] += 1;
            if self.pending[i] < self.end[i] {
                return Some(result);
            }
            self.pending[i] = self.start[i];
        }
        self.done = true;
        Some(result)
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
        let final_size: usize = self
            .start
            .iter()
            .copied()
            .zip(self.end.iter().copied())
            .map(|(s, e)| e as usize - s as usize)
            .fold(1, std::ops::Mul::mul);

        (final_size, Some(final_size))
    }
}

impl<const N: usize> ExactSizeIterator for CellIter<N> {}

fn calc_cells<const N: usize>(start: [u8; N], end: [u8; N]) -> CellIter<N> {
    CellIter {
        start,
        end,
        pending: start,
        done: false,
    }
}

fn main() {
    let start = [0, 0, 0, 0];
    let end = [127, 127, 127, 29];
    let all_cells = calc_cells(start, end).collect::<Vec<_>>();
    dbg!(all_cells.len());
}

(Playground)

2 Likes

Yeah, great thanks to @raidwas it runs much faster, 14328 milliseconds this time, but still a little bit slower than Java.

Wow, I'll try it out now, maybe it's lightening fast!

This code clearly is allocation heavy. @2e71828's code optimizes it by eliminate most heap allocations except the outmost one, but it uses a feature not available in current stable compiler. This feature is called the min_const_generics, which is planned to be included in the stable compiler in march.

Another optimization you can consider using the current stable compiler is to improve your allocator. By default the Rust uses your system's default malloc/free implementation but it may not be the world's most performant allocator implementation. You can replace it with the jemalloc using the jemallocator crate.

3 Likes

Well, there's 1 heap allocation in the collect::<Vec<_>> call.

I thought I fixed it before anyone noticed it. But you win :joy:

Isn’t there a difference in the end bound? OP’s code seems to use the end bounds inclusive while your version seems to exclude them.

Or am I misinterpreting the code here....

1 Like

It's likely; I admit I didn't study the original code too closely.

Yeah, calling with


    let start=[0, 0,0 ];
    let end=[127, 127, 29];

your’s has length 467741, while the original has length 491520.

It's an easy fix; here's the updated playground. I'll edit my solution above in a minute.

Doesn’t work with e.g.

    let start = [0, 0];
    let end = [255, 255];
1 Like

Your optimization is amazing! It only takes around 550 milliseconds now, great thanks to you, :grinning:

1 Like

Here’s a version that’s actually working:

struct CellIter<const N: usize> {
    start: [u8; N],
    end: [u8; N],
    pending: [u8; N],
    done: bool,
}

impl<const N: usize> Iterator for CellIter<N> {
    type Item = [u8; N];
    fn next(&mut self) -> Option<[u8; N]> {
        (!self.done).then(|| {
            let result = self.pending;
            self.done = (0..N).rev().all(|i| {
                if self.pending[i] < self.end[i] {
                    self.pending[i] += 1;
                    false
                } else {
                    self.pending[i] = self.start[i];
                    true
                }
            });
            result
        })
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
        if self.done {
            (0, Some(0))
        } else {
            let mut x = 1_usize;
            let mut r = 1_usize;
            for i in (0..N).rev() {
                r += (self.end[i] - self.pending[i]) as usize * x;
                x *= (self.end[i] - self.start[i]) as usize + 1;
            }
            (r, Some(r))
        }
    }
}

fn calc_cells<const N: usize>(start: [u8; N], end: [u8; N]) -> CellIter<N> {
    CellIter {
        start,
        end,
        pending: start,
        done: start.iter().zip(end.iter()).any(|(s, e)| s > e),
    }
}

fn main() {
    let start = [0, 0, 2];
    let end = [44, 255, 42];
    let all_cells = calc_cells(start, end).collect::<Vec<_>>();
    let mut l = dbg!(all_cells.len());
    let mut iter = calc_cells(start, end);
    assert_eq!(iter.size_hint().0, l);
    while let Some(_) = iter.next() {
        l -= 1;
        assert_eq!(iter.size_hint().0, l);
    }
    assert_eq!(l, 0);
}

(playground)

Edit: Ah, good point with the start > end. That doesn’t work in my version above.....
Edit2: Fixed that too.. lower bound for size_hint is still wrong though...
Edit3: Fixed the size_hint.

3 Likes

Ok; that's fixed, as is the start > end case: New playground.

As long as the two points are opposite corners of an axis-aligned bounding hyperbox, it should enumerate all the contained points (but not necessarily from start to end).

Edit: Prefer @steffahn's version above to this one; it's more complete.

Just to point out, there are still performance improvements even without completely reworking the code. For example, replacing the line

let mut next_gen_cells:Vec<Vec<u8>> = Vec::new();

with

let mut next_gen_cells: Vec<Vec<u8>> =
    Vec::with_capacity(last_cells.len() * (usize::from(end[dim]) + 1));

shaves 20% of the time off on my machine. (Note that cell[dim] in the inner loop is always 0)

4 Likes

Two problems popped up:

  1. The number of cells isn't correct, it should be 62914560 instead of 59403107.

  2. The dimension size comes from JSON file, only known at runtime, I need to somehow use dynamic array, I tried below:

    fn demo<T, const N: usize>(v: Vec) -> [T; N] {
    let boxed_slice = v.into_boxed_slice();
    let boxed_array: Box<[T; N]> = match boxed_slice.try_into() {
    Ok(ba) => ba,
    Err(o) => panic!("Expected a Vec of length {} but it was {}", N, o.len()),
    };
    *boxed_array
    }

but it doesn't work, Rust says: 'cannot infer the value of const parameter N declared on the function demo'.

In this case, const generics are not an option anymore. I’d be curious how fast my code translated back into a Vec-generating iterator performs, compared to your original one, or the improved version with the with_capacity suggested above

struct CellIter {
    start: Vec<u8>,
    end: Vec<u8>,
    pending: Vec<u8>,
    done: bool,
}

impl Iterator for CellIter {
    type Item = Vec<u8>;
    fn next(&mut self) -> Option<Vec<u8>> {
        (!self.done).then(|| {
            let result = self.pending.clone();
            self.done = (0..self.pending.len()).rev().all(|i| {
                if self.pending[i] < self.end[i] {
                    self.pending[i] += 1;
                    false
                } else {
                    self.pending[i] = self.start[i];
                    true
                }
            });
            result
        })
    }

    fn size_hint(&self) -> (usize, Option<usize>) {
        if self.done {
            (0, Some(0))
        } else {
            let mut x = 1_usize;
            let mut r = 1_usize;
            for i in (0..self.pending.len()).rev() {
                r += (self.end[i] - self.pending[i]) as usize * x;
                x *= (self.end[i] - self.start[i]) as usize + 1;
            }
            (r, Some(r))
        }
    }
}

impl ExactSizeIterator for CellIter {}

fn calc_cells(start: Vec<u8>, end: Vec<u8>) -> CellIter {
    assert_eq!(start.len(), end.len());
    CellIter {
        done: start.iter().zip(end.iter()).any(|(s, e)| s > e),
        pending: start.clone(),
        start,
        end,
    }
}

fn main() {
    let start = vec![0, 0, 2];
    let end = vec![44, 255, 42];
    let all_cells = calc_cells(start.clone(), end.clone()).collect::<Vec<_>>();
    let mut l = dbg!(all_cells.len());
    let mut iter = calc_cells(start, end);
    assert_eq!(iter.size_hint().0, l);
    while let Some(_) = iter.next() {
        l -= 1;
        assert_eq!(iter.size_hint().0, l);
    }
    assert_eq!(l, 0);
}

(playground)

2 Likes