Instrumenting MIR to trace program execution

Hello,

I'm quite new to Rust, especially as far as the compiler itself is concerned. For my thesis, I want to instrument Rust programs and insert additional function calls in certain branches (e.g., after if's) that trace the execution of a branch along with additional information. For this, I need the condition expressions to be atomic, which is why MIR seems to me to be most suitable.

From this archive post, I learned that I can modify MIR and pass the modified version back to the compiler. So what I do is run the compiler and register my own function where I modify the MIR of a function body:

const CUSTOM_OPT_MIR: for<'tcx> fn(_: TyCtxt<'tcx>, _: DefId) -> &'tcx Body<'tcx> = |tcx, def| {
    let opt_mir = rustc_interface::DEFAULT_QUERY_PROVIDERS
        .borrow()
        .optimized_mir;
    let mut body = opt_mir(tcx, def).clone();

    let mut mir_visitor = MirVisitor { tcx };

    mir_visitor.visit_body(&mut body);
    tcx.arena.alloc(body)
};

struct CompilerCallbacks;

impl rustc_driver::Callbacks for CompilerCallbacks {
    fn config(&mut self, _config: &mut Config) {
        _config.override_queries = Some(|session, local, external| {
            local.optimized_mir = CUSTOM_OPT_MIR;
        });
    }
}

fn main() {
    // define args and stuff

    let mut callbacks = CompilerCallbacks {};
    rustc_driver::RunCompiler::new(&args, &mut callbacks)
        .run()
        .unwrap();
}

Now, this seems to work. In my MirVisitor, I want to insert a function call at some point. Let's assume I have the following program I try to instrument and a dummy tracing function:

use 
mod monitor {
    pub fn trace() {
        // do something
    }
}

fn main() {
    let inputs: Vec<u64> = std::env::args()
        .map(|a| a.parse::<u64>().unwrap())
        .take(2)
        .collect();
    let x = *inputs.get(0).unwrap();
    let y = *inputs.get(1).unwrap();

    if x < y {
        println!("x < y");
    } else {
        println!("x >= y");
    }
}

The relevant basic block in the MIR of the main function looks like this:

bb10: {
        _12 = (*_13);
        _19 = _6;
        _20 = _12;
        _18 = Lt(move _19, move _20);
        switchInt(move _18) -> [false: bb13, otherwise: bb11];
    }

Now I want to call the monitor::trace function directly after that, on both control paths. A function call is a terminator, so I could insert a whole new basic block for each branch and let it point to the original descendent blocks, i.e.,

bb10: {
        _12 = (*_13);
        _19 = _6;
        _20 = _12;
        _18 = Lt(move _19, move _20);
        switchInt(move _18) -> [false: bb14, otherwise: bb15];
    }


bb14: {
        _22 = monitor::trace() -> [return: bb13, unwind: ...]; 
    }


bb15: {
        _23 = monitor::trace() -> [return: bb11, unwind: ...]; 
    }

I understand that I will also need to add the locals, however, I don't even know how to create such an artificial function call programmatically in the first place. A part of the Terminator struct is a reference (ConstantKind::Ty). Hence, the basic block I initialize does not live long enough to be passed back to the compiler :frowning:

Is there a way to implement this idea anyway? Maybe in a different way? Note that I don't just want to trace coverage, but also make computations based on the runtime values used in condition expressions and so on.

For this kind of question involving rustc internals, you might have better luck asking on internals.rust-lang.org or on rust-lang.zulipchat.com—you're more likely to reach people who are familiar with the relevant code on those forums.

2 Likes

Hey Cole, thanks for pointing that out, I'll ask the question there :slight_smile:

So, just for reference: I got an answer on Zulipchat which helped me. The solution is to make use of mk_* methods that TyCtxt provides. In my case, I only had to intern a Const, which is part of my artificial basic blocks, through mk_const to make it live longer. Other than that, the idea works as I expected it to :slight_smile:

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.