Looking for a code review on a KDTree project of mine before I go present it at a conference. I am unsure of the idiomatic-ness of it

So, I have been working on this project for quite some time. My end goal is to do speed comparisons with a previous c++ implementation and (hopefully) present at a conference about rust being a viable alternative to c++ in scientific simulations.

The structure is of a binned (terminology?) crate (again - terminology?) and not a library because I will very soon recreate the actual simulation in the main.rs, but that will just be calling functions from the kdtree mod.

Anyway, I have documented all of the functions (I think) and tried to make it as rusty as possible, but as I am not as advanced in this language as most here, I hope for some level of peer review and advice! Especially on the fronts of idiomatic rust and speed. I hope I made most functions relatively clear... The majority of the "beefy" functions are in the mod.rs of the kdtree module. The other things are just structs, i/o stuff, and convenience functions.

I hope this isn't asking too much! It doesn't have that much code in it. If anyone has any advice on making my code easier to review, as well, that would be appreciated. I can do more line-by-line commenting if it is too difficult to read right now.

Thanks!

https://github.com/sezna/gravitational-kd-tree

What would be the most convenient way to add comments? I think I would be able to look into this more thoroughly soon. For now, here a couple suggestions off the top of my head.

  • link a good description of what kd-tree actually is from README :wink:
  • formatting is a bit off (tabs and spaces, missing space after ":"). I would suggest cargo install rustfmt && cargo fmt. It's not perfect, but usually good enough.
  • there are some worrisome cargo warnings:
src/kdtree/io.rs:86:9: 86:28 warning: value assigned to `to_write_string` is never read, #[warn(unused_assignments)] on by default
src/kdtree/io.rs:86     let mut to_write_string:String = "".to_string();
  • Looks like new_kd_tree function should take ownership of a vector of participles, rather then a mutable slice, because currently it copies the slice to a vector anyway.

  • #[derive(Clone) on a Node is troubling me. Node is a rather fat recursive structure, so it's better not to copy it at all. C++ it rather liberal about coping things around, while in Rust copy is usually more explicit and is often avoided.

  • Node::new method may be not necessary. In C++, you tend to default-construct all your objects, and then fill them field by field, In Rust, you usually initialize structs in one step with all necessary data. You don't need a default constructor just to use you struct with collections.

I will edit this post if I have something to add:

In dimension.rs:
You could probably use std::convert::AsRef trait to implement that function, although I have yet to look if it is appropriate.
Also, it's a bit worrisome that the string "Null" is returned, usually in Rust when you want to express that something did not return a result, you would use Option, in case you want to say that there is no result, you return Option::None, but again, I have yet to see how this is used.


In io.rs:
let mut tmp_str:String = "".to_string(); - could be written as let mut tmp_str = String::new();, the same way you created the buffer a bit higher let mut s = String::new();.

let mut tmp:Vec<String> = Vec::new();
let mut particles: Vec<Particle> = Vec::new();

I don't know if the compiler complained that it doesn't understand what type is desired but usually it can deduce from the code what that vector is storing, so, usually you can write only let mut tmp = vec![];

I see that you have to handle pretty often errors in the function open_data_file, you could consider using a Result<Type, Error> and have early returns that propagate the error to the caller of the open_data_file function by using the try! macro.

The code seems to use often format! where it just needs to get a String, format! is not very optimal, it is intended to be used when you need to create a complex String result.
Take this example: to_write_string = format!("{}", to_write.pop().expect("").as_string());
There is no need for format! here, also, expect is not very helpful here :slight_smile:
Reference for this affirmation.


In particle.rs:
https://github.com/sezna/gravitational-kd-tree/blob/master/src/kdtree/particle.rs#L36
It might be what you intended but it seems strange that you pass a tuple with 3 values but use only the first one acc.0.

Since you asked about idiomatic Rust:
https://github.com/sezna/gravitational-kd-tree/blob/master/src/kdtree/particle.rs#L65
In Rust you usually don't write return on the last "return". One would usually write:

fn new() -> SomeType {
    SomeType {
        ...initializing fields here...
    }
}

Also, since you set them to 0 anyway, you could add to derive the Default:

#[derive(Clone, PartialEq, Default)]

And now you can write the function like that:

pub fn new() -> Particle {
    Particle::default()
}

In `utilities.rs`: https://github.com/sezna/gravitational-kd-tree/blob/master/src/kdtree/utilities.rs#L68 [std::slice::swap](http://doc.rust-lang.org/std/primitive.slice.html#method.swap)

All these recursive calls(find_median_*) might be good for some rayon, not sure, just consider it :slight_smile: :
https://github.com/sezna/gravitational-kd-tree/blob/master/src/kdtree/utilities.rs#L55


In mod.rs:
https://github.com/sezna/gravitational-kd-tree/blob/master/src/kdtree/mod.rs#L93
Here clearly particle_after_gravity is intended to be used to modify the current Particle(post_gravity_particle_vec[i]) but the function actually clones was is passed in, modifies it and then returns the new clone. You can receive particle: &mut Particle instead and modify it in place.
In general there is to much cloning of structures, would be nice if the code used Rust to its full potential and instead rely more on references because Rust can help make sure we use them safely.

I wish I could suggest more precisely where to use references instead but I would have to really understand what the entire code is doing, currently I just look over.

particle_gravity function always receives a tuple with (0.0, 0.0, 0.0) and then makes another clone of it, it could just remove that parameter and set let mut acceleration = (0.0, 0.0, 0.0).

https://github.com/sezna/gravitational-kd-tree/blob/master/src/kdtree/mod.rs#L120
This is not usual to Rust, I see the code uses this "pattern" in most places where there is a temporary binding and then later it drops what is contains and sets other values, also being forced because of this to set them as mutable.
I guess this is done because Rust doesn't allow uninitialized "variables"(bindings), but, actually, you can do something like:

let a: MyType;
a = MyType::new();

As long as all code paths are covered and the variable is initialized, Rust will be perfectly fine with it(with the exception of some bugs that by mistake won't allow you to do some things).
Or you could just write the binding in every scope let tmp_accel = ....
Also note that you are shadowing it with let tmp_accel that you have in the next 2 blocks, not that it is an issue in this particular case.


This could go wrong: let length_of_points = pts.len() as i32; if pts.len() is bigger than std::i32::MAX(as in C++, containers work with usize)

2 Likes

Anything works. Typing here, cloning and typing comments, writing on paper and scanning... I'll go ahead and work on the readme and formatting. I did run rustfmt periodically, but failed to do so before posting here!

I'll check where I am using clone on the Node. I was under the impression that it was just cloning the node and the reference to the next node in the chain, not all of the nodes themselves.

I used Node::new() to make a mutable node and then assign the values, I guess that is rather c++-style. I'll initialize the struct all at once.

I'll also work out the ownership of new_kd_tree. Thanks for the feedback!

The "null" dimension is only used on leaf nodes, because they do not split anywhere. I guess it would be more appropriate to name the enum "SplitOnDimension" or something. I was thinking of using option types but it would require a bit of refactoring. Which approach do you think is better style-wise? The "Null" type in Dimension isn't a true "null", it just sort of means N/A - the node was not split.

Thanks for the string help - I often find myself writing apparently unnecessary type annotations.

And returning a Result would let me use try!(), right? I think I will do that.

A lot of these things seem obvious in hindsight, but I never would have thought of them while writing the code. Thank you!

All this options in Node make me think that you want

enum Node {
    Leaf {
        particles: Vec<Particle>
    },
    Interior {
        dimention: Dimention // without Option or null
    }
}

make illegal state unrepresentable! :wink:

That is making me remember my Haskell course. Yeah, that does look like a good idea. I might do that after I make these other changes.

By the way, look at line 225 in mod.rs. I thought about getting rid of Node::new() there and constructing the struct at the end, but it just seems much cleaner the way it is. What do you think?

It takes a bit of time to adjust to a new language, but if you study idiomatic code written by others, read about some idioms of functional programming, and you practice writing Rust, you will learn those idioms, they are based on features that are designed in a simple and clean enough way.

1 Like

Much cleaner is, unfortunatelly, subjective metric, but, if we take just one snippet from this function, and examine two ways of writting it:

A: as it is, imperative style,
B: more functional, expression oriented style,

We can see that B is shorter, and guarantees that each branch calculates both a dimension and a value.

Hm, I've actually written something similar (bounding volume hierarchy) a while ago: rustraytracer/bvh.rs at master · matklad/rustraytracer · GitHub

You make look there for how enum Node implementation of a tree in Rust might look like.

To summarize on the idiomatic-ness of the code.
In my own opinion, in Rust:

  • let var = vec![]; is preferred over let var: Vec<Type> = Vec::new();
  • .unwrap() > .expect("") - if error handling is not necessary.
  • let buffer = String::from("some text"); > let buffer: String = format!("{}", "some text".to_string());
{
    let var = SomeStructure { a: 0.0, b: 0.0 };
    use(&var);
}

over >

let mut var = SomeStructure::new();
{
    var = SomeStructure { a: 0.0, b: 0.0 }; // dropping previous allocation
    use(var.a, var.b);
}
fn some_func() -> SomeType {
    let var = ...; // where SomeType is associated
    ... may be doing stuff ...
    var
}

over >

fn some_func() -> SomeType {
    let var: SomeType = ...;
    ... may be doing stuff ...
    return var;
}

-& || &mut || in rare cases: .clone() > .clone()

That's exactly how Option is usually used. Returning Option::None when you want to indicate that nothing happened or there is no result for the user of the function. I would actually not replace an enum with strings, it's a bit expensive but may be I don't understand how it is used...
Btw, Option::None can be optimized well by the compiler.

Yes, and understanding why the try! macro does this is actually pretty simple, the code is a simple early return.

[quote="LilianMoraru, post:11, topic:4863"]
let var = vec![]; is preferred over let var: Vec<Type> = Vec::new();
[/quote]This isn't clear-cut. If it could be difficult for a reader to infer the type, spelling it out is beneficial.

Yes, I wrote this because I mentioned before that in some cases the compiler can deduce the type from the code that uses the vector.

Ow, for a reader... Not sure if I agree or not...
That would mean that all bindings have to manually be marked. At the same time, I do agree to some degree and I myself like being verbose, although not the same way the code is with these vectors...

I go on the basis if I think the reader would need help understanding the type or not.

People will use Vec::new() over vec!, the empty case, as well.

First of all, I am happy to see someone implementing a numeric tool in Rust!

Then I will try to avoid unnessary double changes:
Your max_min_ functions could indeed be refactored:

// instead calculate max/min in all three coordinates.
// this should be more efficient (cache local)
// careful with empty input!
pub fn max_min(particles: &[Particle]) -> (Point3, Point3) {
    use std::f64;
    assert!(particles.len() > 0);
    
    // every value will be higher than this one
    let mut max = Point3::new(f64::MIN);
    
    // every value will be smaller
    let mut min = Point3::new(f64::MAX);
    
    // could use some magic ... (for c in xyz)
    for p in particles {
        if p.x > max.x { max.x = p.x; };
        if p.y > max.y { max.y = p.y; };
        if p.z > max.z { max.z = p.z; };
        
        if p.x < min.x { min.x = p.x; };
        if p.y < min.y { min.y = p.y; };
        if p.z < min.z { min.z = p.z; };
    }
    
    (max, min)
}

Leading to less code in theta_exeeded.

fn theta_exceeded:

let x_distance = (max_min.0 - x_max_min.1).abs();

Instead of storing the result of max_min in a single variable, you can unpack them:

let (max, min) = max_min(&node.points.as_ref().unwrap());

The multible occurences of ...

let d_over_d_cubed = (d_vector.0 / d_magnitude.powf(2.0),
d_vector.1 / d_magnitude.powf(2.0),
d_vector.2 / d_magnitude.powf(2.0));

could be simplified by implementing the Div operator for Point3 and using that type whenever you are using (f64, f64, f64) now.

use std::ops::Div;
impl Div<f64> for Point3 {
    type Output = Point3;
    
    fn div(self, rhs: f64) -> Point3 {
        Point3 { 
            x: self.x / rhs,
            y: self.y / rhs,
            z: self.z / rhs
        }
    }
}

... allows to write ...

let d_over_d_cubed = d_vector / d_magnitude.powf(2.0);

Implementing the std::ops::Mul Trait aswell has similar effects.
Maybe Vec3 would be a better name instead of Point3.

The next pattern that could be optimized is:

let d_magnitude = particle.distance(&node_as_particle);
let d_vector = particle.distance_vector(&node_as_particle);

As far as I know the following should work as well:
let d_vector = particle.distance_vector(&node_as_particle);
let d_magnitude = d_vector.abs();

You could even write

fn distance(&a: Point3, &b: Point3) -> (Point3, f64)

returning d_vector and d_magnitude at once.

In fn particle_gravity the replacement of acceleration_total: (f64, f64, f64)
with acceleration: Point3 also eliminates the .clone() .

In fn particle_gravity you can collapse both arms using the following:

for side in &[&node.left, &node.left] { match side { ... } }

The inner & turns the values into references and the outer turns the array into a slice, so you can iterate over it.

In fn new_root_node you can use the following construct:

let (split_value, split_dimension) = {
    if zdistance > ydistance && zdistance > xdistance {
        find_median_z(pts, start, end, mid)
    } else if ydistance > xdistance && ydistance > zdistance {
       find_median_y(pts, start, end, mid)
    } else {
       find_median_x(pts, start, end, mid)
    }
}
root_node.split_dimension = split_dimension;
root_node.split_value = split_value;

I am going to stop here for today/tonight, but I hope to have given you a few concepts to
rustify/simplify the code.

greetings

1 Like

Perhaps clippy can help you write more idiomatic code. You'll need a nightly Rust to run it, but with multirust or multirust-rs this is quite easy.

Thank you all for the input - I will be changing these things and will update you all upon completion. I just have to take some time to actually implement all of these changes :slightly_smiling: