Announcing RustPräzi: a tool to build an entire call graph of crates.io


#1

TL;DR: RustPräzi is like rust-lang-nursery/crater, but creates a single versioned call graph of crates.io

We are happy to announce our first release of RustPräzi, a PoC (Proof-of-Concept) project that downloads all crate versions from crates.io, builds LLVM call graphs and links them into a single large versioned call-based dependency network. Unlike a regular dependency network, a call-based dependency network represents function call chains on both the intra- and inter-package level, supporting graph analytics/queries such as:

  • Identifying central crate APIs that are important for the stability of crates.io
  • Impact analysis of deprecated API functions: how many crates are still depending on deprecated functions that should be removed?
  • Security vulnerabilities: which crates in crates.io are affected by a vulnerable function?

Link to the project: https://github.com/praezi/rust

Link to our preliminary research paper: https://pure.tudelft.nl/portal/files/46926997/main2.pdf.

What is WIP?

Our current focus is to make it production-grade, in particular:

  • Add proper error management, retry mechanism for running failed compilations
  • Integrate it with cargo and add extensible analysis modes
  • Incrementally update the graph when a new release is published
  • Implement a robust query platform with a proper graph database

Vision

Our long-term goal is to equip the cargo community with a tool that can aid in the stability of crates.io, prevent publications of impactful bad releases by lightweight code vetting (like this fresh incident [1]), and also crate maintainers can understand the changes they make.

[1] https://www.theregister.co.uk/2018/11/26/npm_repo_bitcoin_stealer/

Want to know more?

Chat with us on https://gitter.im/praezi/rust

Joseph(@jhejderup), Moritz, and Georgios


#2

That’s very cool. I hope you will be making a results/analysis paper at some point to follow up on the methods paper?


#3

Would really love to see a visualization of the aggregate call graph


#4

While I understand it’s not the intended use-case, I wonder does this tool have a way of incrementally “refreshing” the CDN? It’d be interesting to see the Rust ecosystem evolve over time.

Despite our best efforts, we could not compile 23,063 (31%) package releases

That is an astonishingly high percentage to me. I presume neither of those packages have any reverse-dependencies. Would it be possible to remove them?


#5

This is indeed a missing feature, we are currently working on making incremental updates of Präzi.

In principle, crate releases that have code errors could be removed. Also, we could not compile certain crate releases due to incorrect compiler version. These crate releases are either compatible with pre-1.0 rustc or nightly versions. Since Cargo.toml file does not specify the compiler version, it becomes a bit tricky to find the working compiler version.


#6

This is in the pipeline; we will post about it here once we complete a more comprehensive study


#7

I presume (hope?) many of the packages that would not compile require an external tool or library, in which case this may not be so bad…


#8

I’d also expect that some crates, such as those targeting bare metal/embedded targets may not build correctly for a desktop target, depending on the environment you use. If you run into any of those, feel free to give us a shout at the Embedded WG organization, or at #rust-embedded on IRC for assistance!