Hello everyone! Since we weren't accepted in the GSoC, we organized our own summer of code, and as usual we have a slot for improving our decompiler Radeco.
Radeco (based on radeco-lib) is a radare2 based static binary analysis framework. Currently, radeco is stable enough and has several analysis passes built in. We believe that this RSoC is a good opportunity to push radeco further and implement our very own decompiler within radare2!
This task involves completion of a decompiler backend using the analysis in radeco. Once the preliminary results are obtained, students are expected to continue working on improving the quality of decompiled code.
Task
Implement Memory SSA
Complete the VSA (Value Set Analysis)
Expand the supported architectures list
Improve the pseudocode output and add more tests (compared with output of Hex Rays)
Use Godbolt to produce binaries with different compilers and optimization levels for tests
Skills
The student should be familiar with Rust and decompilation basics or be able to learn it quickly.
Difficulty
Advanced
Benefits for the student
The student will learn decompilation theory and perform complex graph transformations, as well as learn the specifics of particular compiler optimization passes.
Benefits for the project
Successful completion of this task will mean the first release of radeco which can generate readable and optimized C code.
Mentors
xvilka
deroad
Assess requirements for midterm/final evaluation
1st term: Implementing Memory SSA and VSA.
2nd term: Supporting architectures: x86, amd64, ARMv7, ARMv8, PowerPC, MIPS, V850, and implementing regression tests for them.
Final term: Refining C output, finished integration with radare2 and Cutter, writing regression and unit tests, updating documentation (including r2book).
Memory SSA - A Unified Approach for Sparsely Representing Memory Operations: hxxp://www.airs.com/dnovillo/Papers/mem-ssa.pdf
Effective Representation of Aliases and Indirect Memory Operations in SSA Form: hxxp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.6974&rep=rep1&type=pdf
Papers about decompilation: hxxps://drive.google.com/drive/folders/0B1X32SwXTZPuYWwxWW5BNi1oWDA?usp=sharing
This task is for improving the results of decompilation by recovering types (char, char *, structures, unions, classes, etc). Apart from ability to inference them through analyzing data flow, radeco should be able to exchange this information with radare2 and Cutter, initially loading from them, then synchronizing back refined results.
Task
Define and implement type system
Implement type inference techniques
Support for structural types loading and inference
Support for constrained types
Implement IR writer/reader with type information
Implement a backend to convert the IR to C AST with type information
Skills
The student should be familiar with Rust and decompilation basics or be able to learn it quickly.
Difficulty
Advanced
Benefits for the student
The student will learn decompilation theory and work with the type system.
Benefits for the project
This task allows to produce the more readable IR/C output.
Mentors
xvilka
Assess requirements for midterm/final evaluation
1st term: Basic and structured types support in IR and propagation through all stages of radeco
2nd term: Types inference engine
Final term: Integration with radare2 and Cutter, regression tests, complex types inference, radare2 book documentation
Links/Resources
Commands and API for setting/changing types of the variables - Issue #183
Because we need to be sure students can qualify for this internship, it is required to take one of the microtasks (or some of the simple issues from the repositories):
Sorry for splitting it into the multiple parts, but there is a restriction for new users to not use more than 2 links in the post. And due to that, even after the split I had to skip many useful reference links, sigh. Moderators/Admins - please merge them into the one message, thank you, and sorry for the inconvenience.
I also added some links as hxxp:// to bypass the limitation.