Citrus: C to Rust converter


#1

I’d like to announce a tool that’s meant to help rewriting C codebases in Rust. It’s a source to source translator: it takes a C file and produces roughly the same code with Rust syntax.

Unlike Corrode it doesn’t even try to preserve correct C semantics, so the generated code won’t even compile until type errors and C-isms are fixed manually.

The goal here is to produce readable code that’s a starting point for refactoring. Citrus helps with the boring syntax conversions when you’re manually converting a C codebase to Rust bit by bit. For example, you may be able to copy&paste some C expressions, loops, or small functions to Rust without having to manually correct type var to var: type over and over again.

It’s like bindgen, but also includes function bodies.


So, give it a try. I haven’t got decent error reporting yet, so start with small simple files :slight_smile: There is still a lot of missing bits and corners cut, so please file issues.


How to transform to SSA form and other compiler-isms in AST
#2

Back story: it’s the third version of this program. I’ve learned the hard way that:

  • Parsing of C is not fun (typedefs, dangling elses, preprocessor quirks, and compiler-specific headers), so I’ve had to use Clang here.

  • It can’t be done with the stable libclang. Its view of AST is just too vague and incomplete. It seems like it’s 90% there, but the last 10% is undoable (e.g. can’t distinguish between for(a;b;) and for(;b;c)). I’ve had to use LLVM/Clang internal C++ functions for disambiguation. Unfortunately, this means building the project is a pain. I’ve got pre-built binaries for you.

  • It can’t be done well in one pass. I’ve built a PHP to JavaScript converter as an excercise to figure out a working approach. Getting a simple, flexible AST first, and then cleaning it up gradually in multiple passes is the way to go!


#3

Does it support any combination of goto/for/while/switch/if?

Earlier version of Corrode failed to convert some things because of too tricky flow control.


#4

It does recognize standard for as for i in 0..x, and falls back to while otherwise. It rewrites swich without fall-through to match (and generates wrong code outherwise :slight_smile: ).
goto is rewritten as break 'label, but it doesn’t really make sense. If you have tricky flow control, I suggest first refactoring C to have boring flow control.


#5

It handles complex switch now.


#6

New version released: https://gitlab.com/citrus-rs/citrus/tags

I’d love to hear your feedback. Did it work for you? Do you need ObjC or C++? What code patterns it could translate better?


#7

I tried using it but I keep getting a file not found error

➜  bin ls
citrus  hyperloglog.c
➜  bin ./citrus hyperloglog.c 
fatal error: No such file or directory (os error 2)

#8

Make sure you have clang in PATH. On macOS xcrun is also needed.


#9

Aha, installing clang solved it :blush: I must have missed that somewhere in the readme/doc. The error message could use some love though.

I tested it and from a first glance the outputs look surprisingly nice.

The comments are a bit out of place though, ex: https://github.com/antirez/redis/blob/unstable/src/hyperloglog.c#L182

gives

pub struct hllhdr {
    pub magic: [i8; 4],
    pub encoding: u8,
    pub notused: [u8; 3],
    pub card: [u8; 8],
    pub registers: [u8], /* "HYLL" */
    /* HLL_DENSE or HLL_SPARSE. */
    /* Reserved for future use, must be zero. */
    /* Cached cardinality, little endian. */
    /* Data bytes. */
}

Macro expansion also interleaved with a function comment

pub static HLL_P_MASK: c_long =

    /* Given a string element to add to the HyperLogLog, returns the length
     * of the pattern 000..1 of the element hash. As a side effect 'regp' is
     * set to the register index this element hashes to. */
    /* Count the number of zeroes starting from bit HLL_REGISTERS
         * (that is a power of two corresponding to the first bit we don't use
         * as index). The max run can be 64-P+1 bits.
         *
         * Note that the final "1" ending the sequence of zeroes must be
         * included in the count, so if we find "001" the count is 3, and
         * the smallest count possible is no zeroes at all, just a 1 bit
         * at the first position, that is a count of 1.
         *
         * This may sound like inefficient, but actually in the average case
         * there are high probabilities to find a 1 after a few iterations. */
    ((1 << 14) - 1);
pub static HLL_REGISTERS: u64 =
     /* Register index. */
    /* Make sure the loop terminates. */
    (1 << 14);
pub unsafe fn hllPatLen(mut ele: *mut u8, mut elesize: usize,
                        mut regp: &mut c_long) -> c_int {
...
}

#10

I’ve improved comments around structs, but in general comments will get a “downwards” shift, since the syntex pretty-printer seems to like it this way :confused: