Statically analyzing a crate to identify all instances of types being referenced by other types

Seeking help from someone who knows rustc internals well.

I'm trying to analyze my codebase to build a (huge!) directed graph of references between all Items in all crates. Any time a type or function references another type or function in any way, I want to see an arrow between the two in my graph. For instance, if a struct A contains a Box<Fn(B) -> C>, then I want to see arrows in my graph from A to B and C. If a function calls another function in its body, then I want to see an arrow between those two functions, and if a function F contains some type D in its signature, then I want to see an arrow between F and D. You get the idea.

And I want this to happen across all crates, so that whether I have all my code in one crate, or have it split up into 10 or 100 crates, I would get the same graph.

The reason I want this is to assist in reorganizing a huge project into crates. I figure I can do queries and modifications on this graph to understand the implications of moving a particular item from one crate to another, and to get a sense of which items belong together and where the natural lines of separation are.

I've gotten pretty close to a promising solution, using rustc_interface to run a compiler and walk the HIR, visiting every item in the codebase. The thing I'm not clear on is what identifier I should be using to ensure that items are registered the same way across different crates. I think what I want is DefId. I'm not sure how to get my hands on DefId for items though. I am able to get the HirId of every item, but I don't know how to get to a DefId from there. I know I can get the DefId of the owner of any item via item.hir_id().owner.def_id, but not the DefId of the item itself...

So, first of all I'm wondering if my approach seems good for the problem I'm solving. If not, please suggest something else! If so, I'm wondering if DefId is indeed what I want, and if so, how I can get at it.

3 Likes

FYI I cobbled something together that works for me. The rough idea is, I used tcx.hir().visit_all_item_likes_in_crate(&mut v); with a Visitor using visit_path to collect hir.parent_owner_iter(id) and doing some heavy regex on the def_path_debug_str for the DefId to remove some noise, which gave me a good basis for grouping together items with the same parent, across crates.

If anyone wants to know more, I'm happy to respond to a message.

This was the regex:

        let re = regex::Regex::new(r#"\[[0-9a-f]+\]|::\{constructor#\d+\}"#).unwrap();
        let s = re.replace_all(&s, "");
        Self(s.to_string())

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.