Working Rust, but is it at all idiomatic?

jgilray · March 11, 2020, 3:04am

I blindly converted some Racket code (a dialect of Scheme) to Rust. The good news: It works and is about 5x faster than Racket. I would love some advice on how to improve this code and make it more idiomatic.

Here are my thoughts:

I used to_owned() in two places. Can they be avoided?
I use VecDeque in the "merging" section of the code and Vec in the "expression tree evaluating" section. They felt appropriate, but maybe something else is more effective or efficient?
I pass &mut structures around instead of returning structures. I kind of struggled with returning references and this seemed to work OK
My code feels clunky, for example, I couldn't get fold to work and I also couldn't use the HashSet::union function so I ended up with a lot of for loops

Here is the code:

The code basically solves problem 259 at projecteuler.net (I apologize to the PE purists for posting code), which is to sum all the positive integers that result from combining all the partitions of the digits with the basic 4 arithmetic functions

Thanks,
jgilray

jgilray · April 18, 2020, 1:54am

Thanks for moving this to "code review", @mbrubeck. I hope that someone takes a look at it and gives me some pointers.

-jgilray

daboross · April 18, 2020, 3:32am

I think one thing that makes this pretty hard to review is that it's math-dense, and doesn't have a lot of documentation for what each function does. I can help review the rustyness of individual lines of code, but a lot of what makes code good or bad is it's overall structure, and figuring that out without good documentation & without intimately understanding the math is hard.

Things I found, though, looking over it:

Things I'd care about more deeply and would probably request changed in a PR:

I your first to_owned call, on lines 27, is actually just cloning the value, and should use clone instead. ToOwned is implemented for everything which implements Clone, so using to_owned is functionally equivalent, but it gives the reader the wrong idea. When I see to_owned I initially assume you're doing an operation like turning &[i64] into Vec<i64> - changing types - not turning &VecDeque into VecDeque.
It looks like all_tree_values is only ever called with a new HashSet in the second argument. If this is the case, I highly recommend refactoring it so it just creates that hashset inside itself and returns it as a result.
Similarly, I think split should also return its result, rather than mutating an argument. It'll make it's purpose much more clear, and also clear up code calling it.
```
let sst = split(&lhsv, &rhsv);
```
is much clearer than
```
let mut sst = HashSet::new();
split(&lhsv, &rhsv, &mut sst);
```
It's not bad per se, but I'd remove a lot of the type annotations you have. For instance, in lines 57-66, you're adding two rationals, so I think it's fairly obvious that you get a rational out. Giving rp, rm and rt explicit types confuses me, then, and makes me think "is there something weird going on that requires this?".

Similarly, when declaring collections, I'd leave out a lot of the types. Whenever the right hand side is HashMap::new(), it's obvious that you're creating a hashmap, and there isn't any need for type annotations on the left hand side. Sure, you are telling the reader what element type is expected and that is new information, but I think even that's usually obtainable from the environment.

If the type annotations don't add new information, and they aren't required by the compiler, then all they're doing is giving the reader extra work.
On line 79, I think you should be able to avoid using split_off and instead use something which doesn't mutate the underlying Vec. Immutable slices - &[i64] - can point to arbitrary places in a vec, so you don't need to create a whole new underlying allocation just to split one up.
```
let mut lhsv: Vec<i64> = lst.to_owned();
let rhsv = lhsv.split_off(pivot);
split(&lhsv, &rhsv, &mut sst);
```
Could become:
```
let lhsv = &lst[0..pivot];
let rhsv = &lst[pivot..];
```
This is one of the great things Rust can do which is impossible for many languages, even many GCd ones. These new lhsv and rhsv simply point to sub-slices of the original slice - they don't allocate anything new, these lines literally just change the start & end point from lst.

And if you implement the above suggested change to split making it return its result rather than mutate an argument, this piece of code could even become a pretty readable one-liner:
```
let mut sst: HashSet<Rational64> = HashSet::new();
let mut lhsv: Vec<i64> = lst.to_owned();
let rhsv = lhsv.split_off(pivot);
split(&lhsv, &rhsv, &mut sst);
for r in sst {
    st.insert(r);
}
```
becomes:
```
st.extend(split(&lst[..pivot], &lst[pivot..]);
```

Surface level things - these don't matter much, but they could make your code more concise:

A lot of small loops which just add to another collection can be replaced with extend - a method supported by most collections to "add all" of an iterator.
```
for s in suffix.iter().skip(1) {
    revprefix.push_back(*s);
}
```
would become
```
revprefix.extend(suffix.iter().skip(1));
```
(and similar changes to other such loops)

I think the inner loop in main could be replaced with something functional, if you want:

for r in st {
    if r > Rational64::zero() && *r.denom() == 1 {
        // save only positive integers to be summed
        reachable.insert(*r.numer());
    }
}

Could become:

reachable.extend(
    st.iter()
        .filter(|&&r| r > Rational64::zero() && *r.denom() == 1)
        .map(Rational64::numer),
);

If you implemented my above suggestion of making all_tree_values return its result rather than mutating an argument, you could go even further functional:

let reachable = mergings
    .into_iter()
    .map(|m| all_tree_values(&Vec::from(m)))
    .flat_map(|st| {
        st.into_iter()
            .filter(|&r| r > Rational64::zero() && *r.denom() == 1)
            .map(|r| *r.numer())
    })
    .collect::<HashSet<_>>();

You may or may not see that as an improvement - and the old version was plenty Rusty. Thought I'd put this in though to give you an idea of how it could be written like this.

I think only one of those suggestions is actually a performance improvement, but the rest should give something of an idea of how to make it more Rusty.

It might also be possible to remove the to_owned/clone on line 29, but I don't know enough about the algorithm to say whether it's feasible. If I'm reading it correctly, though, then I think you should be able to have all_mergings take an &[i64], and instead of removing items from it, you can just change what portion of the underlying Vec you're looking at. For instance, line 13 would change like so:

instead of actually mutating the underlying storage:

let frstsuf: i64 = suffix.pop_front().unwrap();

it'd just grab the item and change what part of the underlying storage we think of suffix as:

let frstsuf = suffix[0];
suffix = &suffix[1..];

Anyways, hope that helps! Let me know if you have any questions about any of these suggestions, and whether you think these are reasonable. Most of this stuff, especially in the second "surface level things" list, is subjective - so there's room for debate!

jgilray · April 18, 2020, 9:23pm

Thanks @daboross! Thanks for taking the time to review my code. I very much appreciate your clear suggestions. Your improvements led to 10% fewer lines, almost 10% faster execution, and a lot more clarity. Here's a link to the improved version:

More specifically:

extend() is very nice and I also discovered extend_from_slice()
Thanks for explaining how to use slices to do the pivot
Your functional code is nice and I strive to get better at it, but too many years coding in C make it difficult. One thing that often bothers me is that as nice and concise as functional code is, it often runs a little slower
I wasn't able to make your suggestion of having all_mergings() take a &[i64] instead of a &mut VecDeque<>. I ran into two issues: 1) converting lst to a slice. I tried lst.as_slices().0 and 2) mutability of the slice. I think that a redesign is probably needed to avoid use of a VecDeque here.

Thanks again!
-jgilray

scottmcm · April 19, 2020, 6:35am

Speed between the two is frequently the same in rust, though there are a few situations where one or the other is faster. One important difference, if you're used to C, is that passing closures in Rust doesn't involve function pointer indirection. (The usual example there is the how C++'s std::sort is often faster than C's qsort, and the same is true for rust's [T]::sort(_unstable) -- and Rust's sort implementation is state-of-the-art, in addition to the dispatch differences.)

jgilray · April 19, 2020, 7:09pm

Hi @scottmcm,

Thanks for the explanation. I recently played with some callback code in Rust (Rust Playground) and was really impressed that using closures as callback functions allows very natural passing of parameters as the closure captures the environment where it is declared... so elegant.

-jgilray

system · July 18, 2020, 7:09pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Blog: Idiomatic tree and graph like structures in Rust announcements	6	3781	January 12, 2023
[New to Rust] Is this idomatic/good rust code? help	9	1609	January 12, 2023
Functional programming in Rust (code review) help	8	688	January 12, 2023
How do you work with rust? help	18	2941	January 12, 2023
After a week with Rust help	46	4491	March 27, 2020

Working Rust, but is it at all idiomatic?

Related Topics