Code review (hope I'm not spamming)

Botev · December 13, 2016, 5:22pm

Hi guys so I would like to ask whoever has the time for some code review on a crate I'm doing. As this is one of my first crates this would be very helpful. A link to the repo is : https://github.com/Metadiff/symints/tree/generic (specifically the generic branch). The docs reflect this branch as well. A short summary:

The crate provides manipulation of symbolic integer polynomials. The use case as the main example from me would be automatic compile time shape verification of Tensors with non constant shapes. Otherwise it could have potential usage in other places.

A few key points I'm hoping to get a feedback on:

Does the file structure of the project look ok? Any suggestion on better organizing it?
Since I expose only the Polynomial class should I make its fields private as well? Or maybe just expose all structures?
Any suggestions on better code style or better api etc... ?

Thanks to anyone spending their valuable time on this.

stebalien · December 13, 2016, 7:01pm

File Structure

Usually, black box test cases go in a tests directory at the root of your project (not under src). See Testing. As far as I can tell, all of your test cases are black box test cases.

Also, implementations usually go below definitions, not in different files. For example, I'd put the Composite enum in the composite module.

Other than that, your file layout looks good.

Public or private

Is Polynomial "just" data? If you think you'll ever add constraints or private fields, make its fields private now. If it's just data (no constraints), then you might as well leave it open.

I'm more worried about Composite. Composite is fully public but you appear to be converting the Variable variant to a character through a u8. This won't work very well for if a user passes a large u16.

Code

Clippy

Go install and run clippy (cargo clippy). Unfortunately, this only works on nightly and breaks often because it depends on many compiler internals. However, this will give you lots of good suggestions (if you can get it working...).

Formatting

Go install and run rustfmt (cargo fmt) You don't have to do everything it tells you to but it's a good starting point.

Match

In general, prefer match *x { Case => ..., } over match x { &Case => ..., }. Clippy should tell you to do this.

Use the question mark operator (`?`)

This is a new operator that replaces try! in most cases. Basically, instead of writing try!(...), you'd write ...?.

Avoid `#[repr(C)]`.

Unless you really need it (usually, you're calling C code or doing some fancy unsafe magic) and understand it, don't use the repr annotation.

You don't have to `try!`

Instead of, try!(x); Ok(()), you can just write x if x has the right type. Specifically, you could get rid of all of your try! calls in your formatting (Display and Debug) functions.

Byte literals

You can write byte literals as b'x'. You can write byte string literals as b"abc". No need to cast.

CamelCase

Types and enum variants are usually CamelCase, modules and functions are usually snake_case, constants and statics are usually UPPER_CASE. Your ConvolutionMode enum has UPPER_CASE variants.

Derive Debug

Usually, I prefer to just derive Debug rather than manually implement it. This is especially important when the internals are public as you don't want to lie about public internals.

Using `IntoIterator`

In general, you can iterate over a collection by reference as follows: for x in &my_collection { ... } instead of for x in my_collection.iter() { ... }. This works because there is (usually) an implementation of IntoIterator on &MyCollectionType (where MyCollectionType is something like Vec or HashMap) and for loops work as described in std::iter.

Botev · December 13, 2016, 11:09pm

First big thank you for this. I will update take those into account in order to slightly alternate the style. It is really useful to me.

I have however a few questions for clarification, if that is ok:

Usually, black box test cases go in a tests directory at the root of your project (not under src). See Testing. As far as I can tell, all of your test cases are black box test cases.

Am I correct however that for private members (e.g. if I make Monomial private) the test directory would not have access to them?

Avoid #[repr(C)].
Unless you really need it (usually, you're calling C code or doing some fancy unsafe magic) and understand it, don't use the repr annotation.

One of the future prospects is that this will be part of a library which will be exposed to different languages, like Python, Julia, C++ via FFI. This might require passing Polynomial trough the FFI boundary, which to my understanding needed the repr(C)

I'm more worried about Composite. Composite is fully public but you appear to be converting the Variable variant to a character through a u8. This won't work very well for if a user passes a large u16.

I should probably do this only in the tests. Note that this is done via a special Trait, thus if the user changes to using u16 he has to implement the trait such that it does what he wants.

stebalien · December 14, 2016, 7:03pm

Yes. However, in that case, you'd have problems in your current setup. Sibling/parent modules cannot access private fields. This is generally solved by putting the test module at the bottom of the module you're testing (as a private submodule). That is,

// ... My module implementation ...

#[cfg(test)]
mod test {
    // My test cases.
}

repr(C) tells rust how to layout the struct in memory and how/when to pad. If you're just passing a pointer to Polynomial through FFI boundaries (*const/mut Polynomial), you won't need that. Basically, use repr(C) if you intend to declare the struct in two different languages and need the languages to agree on the memory layout of your struct.

Furthermore, repr(C) may not work correctly in your case because your structs implement Drop (they contain vectors); Some new rust features (MIR) may have changed this but I'm not sure. In general, repr(C) is mostly used for POD (plain old data) structs that you intend to pass by-value through an FFI boundary.

I'm not sure I understand your point here. Currently, I can write:

extern crate symints;

fn main() {
    println!("{:?}", symints::Composite::Variable(0xFFFF));
}

Botev · December 14, 2016, 7:37pm

Thanks again for your answer!

I think from what you said that I don't need in that case repr(C) as you mentioned I will only pass back and forth pointers.

And yes you had a point about the last bit with the u16. I've changed the examples now to use String - that was my mistake.

One last thing, what is better to consider to make the functions like floor, ceil, max, min ? Should I make them as they are now as separate functions, or should I make them as methods of Polynomial? I'm very split on this decision, in the standard library for instance thigns like floor and checked_div are methods, but max and min are module's functions. Are there any suggestions on how to make this decision (I know its probably very subjective).

Btw ++ for clippy and rustfmt.

stebalien · December 14, 2016, 7:38pm

max and min are module-level functions because (1) they're generic over all types implementing Ord and (2) there's no clear receiver. That is, both arguments are equally important. (also, the standard library isn't always consistent).

Personally, I'd make floor and ceil methods and I wouldn't implement new max and min functions (you already implement Ord so the ones from the standard library should work).

Botev · December 14, 2016, 7:53pm

Maybe I should rename the max and min to symbolic max and min. The implementation of Ord provides ordering based on the variables, e.g. a^2 is before ab^2 (if we assume a is before b), e.g. the first Id highest power is always first and so on... This is needed somewhat inside some of the manipulation procedures. My max and min provide sort of evaluation time min and max in the sense that when we evaluate them it will return the max or min of the two polynomials (e.g. if a=1 b = 4 max will give ab^2 = 16, although the ordering is different).

Or maybe I should not implement the Ord in this way?

stebalien · December 14, 2016, 8:50pm

Sorry, I don't think I'll be much help here. I'm not vary familiar with this area so my comments are probably due to ignorance.

Personally, I'd expect a PartialOrd implementation on Polynomial where f < g → ∀ x. f(x) < g(x) but no Ord implementation.

Additionally, I'd expect max(f, g)/min(f, g) to return either h(x) = max(f(x), g(x)) (not a polynomial) or if f < g { Some(g) } else if g < f { Some(f) } else { None }. I don't quite understand what you mean by "symbolic" max/min but I'm not familiar with this area.

I've also never seen a floor function that takes to arguments so I don't really know what they're supposed to do (I assumed e.g. floor simply turned the polynomial into a step function).

Botev · December 14, 2016, 9:44pm

Ah ok, so I will try to explain a bit as these are not exactly like functions, especially when you have several base variables.

So imagine that f = x^2y + y^2 + 1 and g = y^2x + x^2 + 1. It is easy to see that than neither f < g or g < f in the sense forall x,y (99% of multivariate polynomials this will be the case). Thus this ordering, although intuitive, will have very limitted applicability. However, having a way to order them, without knowing the values of x and y, is useful in algorithms for dividing polynomials and Grobner basis - Gröbner basis - Wikipedia. This is the ordering I have implemented. Thus if you invoke ::std::cmp::min you will find the minimum in terms of this ordering.

Example: Monomial ordering x > y > z, then x^2 < x^2y < x^2yz < x^3

The min/max in my package however do return what you've described. In actual term it constructs a new Composite which remembers the two polynomials and treats the expression max(poly1(x,y), poly2(x,y)) as a monomial on its own. An easier way to think about this is that we are introducing a new variable h = min(poly1(x,y), poly2(x,y)) and treating it as a monomial. Only upon evaluation we replace h with the value of min(poly1(x_val, y_val), poly2(x_val, y_val)). The same is true for my floor and ceil where they are essentially encapsulated into a new monomials and only on evaluation are they used.

I hope it at least a bit cleared the muddy water

stebalien · December 14, 2016, 10:02pm

Got it. In general, my rule is: If there is a primary "receiver" (you're acting on/transforming something), use a method. Otherwise, use a free function. These all look like free functions. However, not everyone agrees on this rule (this is a personal rule, the standard library breaks it all the time).

Botev · December 14, 2016, 10:18pm

Okay I see, still haven't decided it though

Thanks again for the code review, I've learned a few tips on better code and a few tools to help me in that.

Hopefully when I start building the Tensor compiler it would be a more interesting project, but it would use this crate in it.

Topic		Replies	Views
New crate: include_data. Review for soundness and API code review	7	425	October 9, 2023
My first Rust crate - hints welcome! help	7	811	January 12, 2023
Difference Bound Matrix Data Structure for Model Checking code review	1	481	April 5, 2021
Mem-query: Relational algebra for Rust collections	2	549	August 20, 2021
First project review code review	6	584	April 1, 2023