Announcing RustPräzi: a tool to build an entire call graph of crates.io

TL;DR: RustPräzi is like rust-lang-nursery/crater, but creates a single versioned call graph of crates.io

We are happy to announce our first release of RustPräzi, a PoC (Proof-of-Concept) project that downloads all crate versions from crates.io, builds LLVM call graphs and links them into a single large versioned call-based dependency network. Unlike a regular dependency network, a call-based dependency network represents function call chains on both the intra- and inter-package level, supporting graph analytics/queries such as:

  • Identifying central crate APIs that are important for the stability of crates.io
  • Impact analysis of deprecated API functions: how many crates are still depending on deprecated functions that should be removed?
  • Security vulnerabilities: which crates in crates.io are affected by a vulnerable function?

Link to the project: https://github.com/praezi/rust

Link to our preliminary research paper: https://pure.tudelft.nl/portal/files/46926997/main2.pdf.

What is WIP?

Our current focus is to make it production-grade, in particular:

  • Add proper error management, retry mechanism for running failed compilations
  • Integrate it with cargo and add extensible analysis modes
  • Incrementally update the graph when a new release is published
  • Implement a robust query platform with a proper graph database

Vision

Our long-term goal is to equip the cargo community with a tool that can aid in the stability of crates.io, prevent publications of impactful bad releases by lightweight code vetting (like this fresh incident [1]), and also crate maintainers can understand the changes they make.

[1] Check your repos... Crypto-coin-stealing code sneaks into fairly popular NPM lib (2m downloads per week) • The Register

Want to know more?

Chat with us on praezi/rust - Gitter

Joseph(@jhejderup), Moritz, and Georgios

17 Likes

That's very cool. I hope you will be making a results/analysis paper at some point to follow up on the methods paper?

1 Like

Would really love to see a visualization of the aggregate call graph

2 Likes

While I understand it's not the intended use-case, I wonder does this tool have a way of incrementally "refreshing" the CDN? It'd be interesting to see the Rust ecosystem evolve over time.

Despite our best efforts, we could not compile 23,063 (31%) package releases

That is an astonishingly high percentage to me. I presume neither of those packages have any reverse-dependencies. Would it be possible to remove them?

This is indeed a missing feature, we are currently working on making incremental updates of Präzi.

In principle, crate releases that have code errors could be removed. Also, we could not compile certain crate releases due to incorrect compiler version. These crate releases are either compatible with pre-1.0 rustc or nightly versions. Since Cargo.toml file does not specify the compiler version, it becomes a bit tricky to find the working compiler version.

This is in the pipeline; we will post about it here once we complete a more comprehensive study

1 Like

I presume (hope?) many of the packages that would not compile require an external tool or library, in which case this may not be so bad...

1 Like

I'd also expect that some crates, such as those targeting bare metal/embedded targets may not build correctly for a desktop target, depending on the environment you use. If you run into any of those, feel free to give us a shout at the Embedded WG organization, or at #rust-embedded on IRC for assistance!

5 Likes

FYI, I just added an affected_functions to the RustSec advisory template which should be useful for collecting the information needed to identify vulnerable functions in this sort of global call graph:

https://github.com/RustSec/rustsec-crate/pull/54

2 Likes

For those who are coming to FOSDEM'19, I will be speaking about RustPräzi at UD2.208 (Decroly) between 10:20-10:45 this Sunday (3 feb). More information about the talk: https://fosdem.org/2019/schedule/event/rust_praezi/.

I will be around both days at FOSDEM, feel free to contact me if you want to have a chat with me :slight_smile:

@jhejderup For incremental call-graph updates, have a look at the salsa crate. I'm not sure it meets your needs but since it is geared towards incremental computation, perhaps it could help with incrementally updating the ecossytem-wide callgraph.

I also have a question for you: is it possible to build a proper subset i.e. to define some set of crates of interest, and just look at them and their (transitive) dependencies and (transitive) reverse dependencies?

1 Like

another reason for crates not being buildable may be those that are name-reservation place-holders (of various kinds)

Regardless, given the high proportion, it would be good to break them down into a few of these categories so we can see what improvements can be made - better metadata, actual fixes, suitable ignores for expected failures, etc.

2 Likes