Any macro consists of 2 parts, the generated code and the generator code.
So if I understand you correctly (and contrary to what I thought earlier) it's the generator code that's slow.
In that case what I would do is still create a new project and write a toy grammar, because the generator code runs while that binary crate is being compiled. But indeed, it's not some loose binary that can be profiled.
If the generator code is run within the rustc process (which I'm not sure of but it well might be) then it's going to be very difficult to get coherent profiling data on just the generator code.
Pull the main proc-macro logic into separate function (or even separate crate), using proc_macro2::TokenStream for input and output; the proc-macro itself would then consist of only one line, like implementation(input.into()).into().
Make a binary which generates the input for the macro as proc_macro2::TokenStream - this can be done by using FromStr implementation, i.e. via str::parse - and outputs the macro output somehow.