We need a great general refactoring tool

Recently, I was muddling through 60kloc of unreadable Rust transpiled through c2rust from an already unreadable C codebase. Needless to say, this codebase required lots of work before it could become usable.

A few examples of things that needed to happen:

  • Turning tests such as (x != y) as int32 as int8 != 0 into proper boolean operations.
  • Turning tests such as x != x (where x is a f32) into proper x.is_nan() operations.
  • Converting large match x { single_value => expr | _ => {} } into if x == single_value { expr }.
  • Converting large match x { single_value => expr_1 | _ => expr_2 } into if x == single_value { expr_1 } else { expr_2 } .
  • Converting all instances of SOME_MAGIC_NUMBER into a constant.
  • etc.

That was just not something I could do on a 60kloc codebase manually and with regexps (although I did crash VSCode a few times attempting to do so). I hoped to be able to use c2rust refactor, but this has been discontinued. I looked at rust-refactor, but it doesn't seem to exist anymore.

So... it looks like we don't have a refactoring tool?

I'd really love to have a tool that would let me:

  1. plug-in a syn::Item -> Option<TokenStream> or a syn::Expr -> Option<TokenStream> (which would serve as a semantic diff), etc.;
  2. generate a diff that a human being could review and apply.

I'm vaguely considering writing one, if there is interest (and if I find time to do so).

Would there be interest in using such a tool?

2 Likes

Please check out coccinelle. You can find a bunch of examples in this presentation (slides).

Coccinelle is a tool for making widespread searches and changes in source code. Coccinelle was originally developed for C code and has been extensively used in the Linux kernel. We are working on porting Coccinelle to Rust, based on the infrastructure provided by Rust Analyzer. This talk will present the current state of the tool, with the goal of getting feedback from developers interested in Rust for Linux.

14 Likes

These feel like things that clippy would do, no?

1 Like

I suspect Clippy focuses on detecting things that people tend to write instead of what code translation tools write. e.g. I doubt many people would write something like (x != y) as int32 as int8 != 0.

2 Likes

Sure, but look at the two I quoted specifically:

error: equal expressions as operands to `!=`
 --> src/lib.rs:1:30
  |
1 | pub fn foo(x: f32) -> bool { x != x }
  |                              ^^^^^^
  |
  = note: if you intended to check if the operand is NaN, use `.is_nan()` instead
  = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#eq_op
  = note: `#[deny(clippy::eq_op)]` on by default
warning: you seem to be trying to use `match` for an equality check. Consider using `if`
 --> src/lib.rs:2:5
  |
2 | /     match x {
3 | |         3 => { dbg!("hello"); }
4 | |         _ => {}
5 | |     }
  | |_____^ help: try: `if x == 3 { dbg!("hello"); }`
  |
  = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#single_match
  = note: `#[warn(clippy::single_match)]` on by default

Those two bullets from the OP are things that clippy already does today.

Are people likely to write the double-cast? No, but that doesn't mean that the pattern for it couldn't be put into clippy anyway, just in case they do -- and for some double-casts, especially mixing signedness, you might think that the intermediate matters when it doesn't.

Cranelift has a beautiful system in its optimizer for writing this. For example, the (x != y) != 0 case is picked up by this rule:

I think what clippy really needs is a way for patterns to be written that easily, rather than as the imperative pattern-matching it uses today. Then it could add all kinds of less-common rules without a bunch of maintenance noise.

1 Like

I do not know if any existing tool can be easily refactored to do this.

I think both your problem (and some problems I have been manually solving) can be solved if we had a tool like sed -- except instead of using regex to match strings, it used

marco_rules! like pattern matching / expansion

and

ran on Rust ast nodes

.

There is GitHub - ast-grep/ast-grep: ⚡A CLI tool for code structural search, lint and rewriting. Written in Rust although it may not be what you are looking for

4 Likes

Note that c2rust refactor was discontinued for a reason. Some simple cases of refactoring mentioned in your post can be performed by a pure ast-matching tool, like semgrep or ast-grep. But that would still leave you with an unreadable pile of code. To really whip it into some shape, you need a tool which can do complex semantic analysis. Type inference, at the very least, and likely various kinds of dataflow analysis. But even just properly implementing type inference (which includes stuff like method resolution) is a ton of work. You basically either need to hack into the internals of rustc, or write a Rust compiler frontend of your own. Not impossible, but don't underestimate how hard is to do it properly. You really need your tool to be error-proof. The last thing you want is to hunt bugs introduced in a 60KLoC codebase by a bug in refactorer.

Consider whether writing some custom Clippy lints would be a better use of your time.

3 Likes

Yes, Clippy can absolutely detect several of these patterns, but it won't auto-fix them in batch.

You basically either need to hack into the internals of rustc, or write a Rust compiler frontend of your own. Not impossible, but don't underestimate how hard is to do it properly.

I have written one already a few years ago to implement a refinement type system on top of Rust's. It was fairly easy to write, but the issue was that rustc changes often and used to completely break such code. I assume that it's still the case.

Consider whether writing some custom Clippy lints would be a better use of your time.

That's mostly orthogonal. Clippy can detect the issue but generally won't attempt to fix it.

If there is a suggestion for the lint, clippy is able to auto-apply it. There is an option for that.
For example, clone_on_copy lint is emitting a suggestion which can be auto-applied.

2 Likes

Ah, good point. At a quick glance, I couldn't find where in the code Clippy was generating suggestions, thanks for pointing me to span_lint_and_sugg!

I’m curious why you decided to go this route. Would it not have been more straightforward to wrap the C code behind a Rust interface? It is a well known design pattern: façade.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.