BenchmarksGame binarytrees


#1

I’ve taken another look at the binarytrees benchmark:

http://benchmarksgame.alioth.debian.org/u64q/performance.php?test=binarytrees

This is the “reference” C version single-thread and without specialized memory allocation:

http://benchmarksgame.alioth.debian.org/u64q/program.php?test=binarytrees&lang=gcc&id=1

This is its recursive tree allocation function:

treeNode* BottomUpTree(long item, unsigned depth)
{
    if (depth > 0)
        return NewTreeNode
        (
            BottomUpTree(2 * item - 1, depth - 1),
            BottomUpTree(2 * item, depth - 1),
            item
        );
    else
        return NewTreeNode(NULL, NULL, item);
}

It allocates on the heap a tree even when depth is zero. The Java code does the same:

private static TreeNode bottomUpTree(final int item, final int depth) {
    if (0 < depth) {
        return new TreeNode(bottomUpTree(2 * item - 1, depth - 1), bottomUpTree(2 * item, depth - 1), item);
    }
    return new TreeNode(item);
}

The faster solution in C also does the same:

static inline tree_node * create_Tree(const intnative_t root_Node_Value,
  const intnative_t tree_Depth, apr_pool_t * const memory_Pool){
   tree_node * const root_Node=apr_palloc(memory_Pool, sizeof(tree_node));

   if(tree_Depth>0){
      root_Node->left_Node=create_Tree(2*root_Node_Value-1, tree_Depth-1,
        memory_Pool);
      root_Node->right_Node=create_Tree(2*root_Node_Value, tree_Depth-1,
        memory_Pool);
   }else
      root_Node->left_Node=root_Node->right_Node=NULL;

   root_Node->value=root_Node_Value;

   return root_Node;
}

But the Rust entry returns a value when depth is zero:

fn bottom_up_tree<'r>(arena: &'r Arena<Tree<'r>>, item: i32, depth: i32)
                  -> Option<&'r Tree<'r>> {
    if depth > 0 {
        let t: &Tree<'r> = arena.alloc(Tree {
            l: bottom_up_tree(arena, 2 * item - 1, depth - 1),
            r: bottom_up_tree(arena, 2 * item, depth - 1),
            i: item
        });
        Some(t)
    } else {
        None
    }
}

This small change halves the run-time of the Rust code. So unless I’m mistaken, I think the Rust entry has a quite unfair bug.

I have also seen copy_arena is a bit faster than typed_arena on my PC. So this is my modified and hopefully fixed version (I don’t know why the site version uses args_os() instead of args()):

// The Computer Language Benchmarks Game
// http://benchmarksgame.alioth.debian.org/
//
// contributed by the Rust Project Developers
// contributed by TeXitoi

extern crate copy_arena;

use std::thread;
use copy_arena::{Arena, Allocator};

#[derive(Copy, Clone)]
struct Tree<'a> {
    l: Option<&'a Tree<'a>>,
    r: Option<&'a Tree<'a>>,
    i: i32
}

fn item_check(t: &Option<&Tree>) -> i32 {
    match *t {
        None => 0,
        Some(&Tree { ref l, ref r, i }) => i + item_check(l) - item_check(r)
    }
}

fn bottom_up_tree<'r>(allocator: &mut Allocator<'r>, item: i32, depth: i32) -> Option<&'r Tree<'r>> {
    if depth > 0 {
        let t1 = bottom_up_tree(allocator, 2 * item - 1, depth - 1);
        let t2 = bottom_up_tree(allocator, 2 * item, depth - 1);
        Some(allocator.alloc(Tree { l: t1, r: t2, i: item }))
    } else {
        Some(allocator.alloc(Tree { l: None, r: None, i: item }))
    }
}

fn inner(depth: i32, iterations: i32) -> String {
    let mut chk = 0;
    for i in 1 .. iterations + 1 {
        let mut arena = Arena::new();
        let mut allocator = arena.allocator();
        let a = bottom_up_tree(&mut allocator, i, depth);
        let b = bottom_up_tree(&mut allocator, -i, depth);
        chk += item_check(&a) + item_check(&b);
    }
    format!("{}\t trees of depth {}\t check: {}", iterations * 2, depth, chk)
}

fn main() {
    let n = std::env::args().nth(1).and_then(|n| n.parse().ok()).unwrap_or(10);
    let min_depth = 4;
    let max_depth = if min_depth + 2 > n { min_depth + 2 } else { n };

    {
        let mut arena = Arena::new();
        let mut allocator = arena.allocator();
        let depth = max_depth + 1;
        let tree = bottom_up_tree(&mut allocator, 0, depth);
        println!("stretch tree of depth {}\t check: {}", depth, item_check(&tree));
    }

    let mut long_lived_arena = Arena::new();
    let mut long_lived_allocator = long_lived_arena.allocator();
    let long_lived_tree = bottom_up_tree(&mut long_lived_allocator, 0, max_depth);

    let messages =
        (min_depth .. max_depth + 1)
        .filter(|&d| d % 2 == 0)
        .map(|depth| {
            let iterations = 1 << ((max_depth - depth + min_depth) as u32);
            thread::spawn(move || inner(depth, iterations))
        })
        .collect::<Vec<_>>();

    for message in messages.into_iter() {
        println!("{}", message.join().unwrap());
    }

    println!("long lived tree of depth {}\t check: {}", max_depth, item_check(&long_lived_tree));
}

I have also seen that there’s a C version (linked above) that doesn’t use threads and arenas. I think it’s useful to have this slower basic version too in the Computer Game site, as reference, as alternative solution (that site allows more than one solution for each language). This is a basic Rust version (for Nightly):

// The Computer Language Benchmarks Game
// http://benchmarksgame.alioth.debian.org/

#![feature(box_patterns, box_syntax, inclusive_range_syntax, step_by)]

struct Tree {
    l: Option<Box<Tree>>,
    r: Option<Box<Tree>>,
    i: i32
}

fn item_check(t: &Option<Box<Tree>>) -> i32 {
    match *t {
        None => 0,
        Some(box Tree { ref l, ref r, i }) => i + item_check(l) - item_check(r)
    }
}

fn bottom_up_tree(item: i32, depth: i32) -> Option<Box<Tree>> {
    if depth > 0 {
        Some(box Tree {
            l: bottom_up_tree(2 * item - 1, depth - 1),
            r: bottom_up_tree(2 * item, depth - 1),
            i: item
       })
    } else {
        Some(box Tree { l: None, r: None, i: item })
    }
}

fn main() {
    let n = std::env::args().nth(1).and_then(|n| n.parse().ok()).unwrap_or(10);
    let min_depth = 4;
    let max_depth = if min_depth + 2 > n { min_depth + 2 } else { n };

    {
        let stretch_depth = max_depth + 1;
        let stretch_tree = bottom_up_tree(0, stretch_depth);
        println!("stretch tree of depth {}\t check: {}",
                 stretch_depth, item_check(&stretch_tree));
    }

    let long_lived_tree = bottom_up_tree(0, max_depth);

    for depth in (min_depth ... max_depth).step_by(2) {
        let iterations = 1 << ((max_depth - depth + min_depth) as u32);

        let mut check = 0;
        for i in 1 ... iterations {
            check += item_check(&bottom_up_tree(i, depth));
            check += item_check(&bottom_up_tree(-i, depth));
        }
        println!("{}\t trees of depth {}\t check: {}", iterations * 2, depth, check);
    }

    println!("long lived tree of depth {}\t check: {}",
             max_depth, item_check(&long_lived_tree));
}

#2

What is the performance difference between the old version and your version?


#3

The new version with what I think is a correct bottom_up_tree, is about twice slower than the version on the site. If you also use copy_arena, it gets a bit faster, but surely not as fast as the site version.


#4

“The work is to create binary trees - composed only of tree nodes all the way down-to depth 0, before any of those nodes are GC’d - using at-minimum the number of allocations of Jeremy Zerfas’s C program. Don’t optimize away the work.”

http://benchmarksgame.alioth.debian.org/play.html


#5

Are you trying to say that I am right?


#6

What would be definitive?

Change theprograms to count allocations, and print the count?


#7
  1. Looks like the entry is wrong.
  2. It’s simpler to change the test with depth >= 0
  3. args_os is used because it was introduced first if I recall well
  4. Only rust stable is available on the benchmarksgame, thus your alternative version doesn’t run
  5. Feel free to submit your corrected version yourself, and/or submit it to https://github.com/TeXitoi/benchmarksgame-rs

#8

So far essentially all the Rust code I’ve written runs on the Alpha compilers only (but I try to avoid the most controversial features). That site is for benchmarks, and quite small benchmarks aren’t really production code, so using a Nightly seems reasonable to me. Is igouy willing to compile the Rust entries with Nightly Rust compilers?


#9

Is igouy willing to compile the Rust entries with Nightly Rust compilers?

No. Only stable is usable.


#10

Then we need another site, that permits to benchmark the Alpha Rust compilers too.


#11

Presumably we are living in the happy expectation that features will migrate to stable and become usable?

(I know SIMD is taking a while but assumed that was particularly awkward.)