Code refactoring project (about concurrent programming) - Seeking advice

Hi there,

I am working in a group on the topic of concurrent programming, and in particular about Java code refactoring to make use of new Java features, namely Streams and Parallel Streams. This kind of refactoring would transform, when possible, simple for loops into Streams (or even parallel ones when the body meets some parallelization requirements) which can lead to performance improvements. The problem is to find when it is possible (and more surprisingly, to find how to work with Java ASTs, which we are currently struggling to do).

I am looking into whether such a topic could be translated into a similar problem in Rust. As far I can tell, Streams could be equivalent to std iterators and Parallel Streams could be similar to what crate rayon offers through parallel iterators.

The problem could then be translated as follows: "Refactoring Rust code to replace when possible for loops with (parallel) iterators". My question is: is it a relevant question altogether in this language, and is refactoring idiomatic in Rust (compared to using macro attributes to indicate you want to parallelize some specific code)? I do not know enough about the ecosystem to know the answer to this question, but sufficiently to be interesting in using Rust in the project. :slight_smile: All the more so because I know Rust has taken this problem very seriously and has already put many tools in place that we need in Java to make sure refactoring is legit (check for mutability, ownership, concurrent accesses, etc.). I saw for example that rayon already did quite the job already and that essentially parallel iterators works out of the box; the compiler complains when there could be a problem so there is nothing really more to do to implement a refactoring tool than to pass the refactored code to the compiler to redirect the output (or am I wrong?). On the other hand, Rust is fare more precise when it comes to borrows or conversions into iterators, etc. so the refactoring in itself (transforming for loops into a more functional-style code) seems trickier. (ex. Loop iterating over ranges would need a different treatment than loops iterating over an iterator)

Secondly, do you think there could be a connected problem, which would be more relevant/interesting within Rust ecosystem, and which I could reasonably tackle? I thought about concentrating on a Rust trait or macro related to this, but found nothing particularly interesting to ponder on. If this reminds you of a similar topic in Rust, I would be interested in hearing about it. I prefer to find a similar problem which is truly relevant in Rust, rather than creating an artificial problem where there is none.

Thanks for reading this far! Any suggestion would be greatly appreciated

So you're basically thinking about a tool that converts a for loop into an iterator?

I think it is a relevant question and that sort of automatic refactoring could be a nice addition to rust-analyzer.

That said, idiomatic Rust tends to naturally prefer iterators over loops because loops almost always mutate some variable outside the loop and Rustaceans typical prefer to avoid mutations altogether.

I would also drop the "where possible" bit of your original statement... Parallelism tends to work best when you do larger chunks of work in parallel because you can ammortize the cost of communication. That means there might only be a couple places where you would replace a iter() with par_iter() and those would require domain-specific knowledge a tool probably won't have access to.

As a trivial example of when fine-grained parallelism isn't a good idea, consider the following code which does a large amount of trivial maths in parallel vs sequentially[1].

use rand::prelude::*;
use rayon::prelude::*;
use std::time::Instant;

pub fn sequential(upper_bound: u64) -> u64 {
    (0..upper_bound).map(|n| n * n + 1).sum::<u64>()
}

pub fn parallel(upper_bound: u64) -> u64 {
    (0..upper_bound)
        .into_par_iter()
        .map(|n| n * n + 1)
        .sum::<u64>()
}

pub fn time_it<F>(func: F, value: u64)
where
    F: FnOnce(u64) -> u64,
{
    let start = Instant::now();
    let ret = func(value);
    let end = Instant::now();

    let name = std::any::type_name::<F>();
    println!("{} took {:?} and returned {:?}", name, end - start, ret);
}

fn main() {
    let mut rng = rand::thread_rng();
    let upper_limit: u64 = rng.gen_range(1000..2000);

    time_it(sequential, upper_limit);
    // Note: run multiple times to let rayon lazily initialize itself
    time_it(parallel, upper_limit);
    time_it(parallel, upper_limit);
    time_it(parallel, upper_limit);
}

(playground)

You can see a noticeable difference between our parallel and sequential sums.

playground::sequential took 958ns and returned 1997953213
playground::parallel took 419.905µs and returned 1997953213
playground::parallel took 19.781µs and returned 1997953213
playground::parallel took 18.384µs and returned 1997953213

  1. Sorry it's so convoluted! I needed to jump through some hoops because the optimizer kept inlining the sequential version to nothing :sweat_smile: ↩︎

I don't think it's a fair comparison since this code doesn't have any loop. You can check the assembly generated from the playground code and see there's no jump-like instructions between two std::time::Instant::now calls in the .LBB71_16: block. How can this code be executed without any loop? Math.

Ugh.

The branch instruction used to be there because I was checking the generated assembly to make sure LLVM didn't do the sum(0..n) = n*(n+1)/2 optimisation. I'm guessing a later refactoring was enough for LLVM to make it go away.

@alice Yes, that's exactly it :grinning:

@Michael-F-Bryan We got interested in when it was correct to refactor code, and not when it was advisable to do it. It's mainly because in Java it was already hard enough to know when we could refactor, and because work from a previous group working on this topic showed there could be some improvement in Java code. Yet it's true that because of optimizations it's not that clear that optimizing could be useful. So I guess it might be better to let the programmer add a little macro attribute for example before a for loop to indicate he wishes that this code be automatically refactored into an iterator (or even a parallel one if needed)? On the other hand, it's more natural indeed to use iterators in Rust, and it would maybe feel strange to ask for a macro to magically transform your for loops into iterators.

As far as I can tell, rust macros are mainly to enhance code or expand some expression into a more complex one at compile time (derive traits and println macros)

Can they be used to refactor code too? (and is it really made for it?)

If you need to go to the code and manually modify it to add an attribute before running your tool, why not just give rust-analyzer a code action which does the transformation for you?

Then people would just need to look at their code, select part of the for-loop, hit ctrl-. and select the "convert loop to iterator" option, and let the transform do its thing.

No.

A macro takes one piece of code and does something akin to a template expansion to generate a lot more code. You mainly use them to avoid boilerplate or let you write something using a more expressive syntax.

I mean, technically you could wrap something in a proc-macro which takes code using for-loop syntax and then ask your IDE to expand the macro invocation to its iterator equivalent, buttt..... that's super clunky and won't even help because macros only receive tokens with no extra context (i.e. it can't tell whether something is valid variable or not, let alone do any complex borrow checker analysis).

Thanks! I'll look into rust-analyzer code actions, it seems like a good idea. Do you have any material on how to implement this (how to create a new code action)? It seems like all code actions are implemented there: rust-analyzer/crates/assists/src/handlers at 550709175071a865a7e5101a910eee9e0f8761a2 · rust-analyzer/rust-analyzer · GitHub, but I have yet to find some enlightening material about how to create and test a new code action.

I would suggest having a look through their docs/dev/ folder to get an overall understanding of how rust-analyzer is designed.

Assists are implemented in the ide_assist crate, with some general advice in the crate docs. You might want to check out the invert_if assist for an example of something non-trivial.

I've spotted in the source code something that replaces a for loop with an Interator::for_each() (fun fact, I tried to implement it myself before I stumbled upon the snippet that was doing exactly what I was struggling to do :laughing:). But I've not found anything to replace this:

for point in self.points.iter_mut() {
    if point.get_x() > 0 {
        point.translate_x();
    }
}

with this:

self.points.iter_mut().filter(|point| point.get_x() > 0).for_each(|point| {
    point.translate_x();
});

It'll be a good start.
What I'm heading for is changing rust-analyzer/crates/ide_assists/src/handlers/convert_iter_for_each_to_for.rs to handle ForExprs with an if expression inside, add a new test and just check that my test passes cargo test. What I'm a bit struggling for is finding the right functions that do exactly what I want (and not in convoluted ways), through. https://rust-analyzer.github.io/rust-analyzer/syntax/index.html is of some help, but I'm mainly proceeding by trial and error :sob:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.