Experience report: upgrading a large codebase (TiKV) to Rust 2018


#1

I just typed this up in a comment on Reddit. I think the process was interesting, with a number of typical Rust-upgrade problems, and this is probably the only place I’ll write it down since I don’t have the motivation to massage it into a full blog post.

Hopefully some of those that still haven’t done the upgrade will get something interesting from it.


#2

Reddit says there are two comments on the post, but I’m only seeing one (by /u/JoshMcGuigan). Shadowban? Glitch in the matrix? Can you copy-paste the contents here?


#3

Huh, I don’t see it either when not logged in as brson.

Here’s the text:

That PR makes it look like upgrading to Rust 2018 was a breeze, but it actually took a lot of experimentation, and several other PRs to get to the cleanliness of that one.

Note that TiKV is on nightlies, which complicates this upgrade.

Here’s try 1. That was an attempt to “just do it”, all at once. It failed, but showed where most of the problems were, and indicated the path for a sequence of smaller, easier-to-land PRs.

First it needed one, two PRs to deal with changes in the global allocator API during stabilization, and the removal of jemalloc as the default allocator. There were a lot of questions about benchmarking and regressions during the upgrade, and making sure we were comparing the exact same allocator on all the benchmarks, before and after, was key to getting reasonable bench results.

Then another PR to upgrade to a toolchain that supported 2018. That involved fixing various incompatibilities between toolchains (mostly due to nightly->stable API changes), updating the source for new clippy and rustfmt behavior. The fuzzers were problematic, and very difficult to debug and fix since they take a very long time to build and test. Ultimately I only applied a single patch to fix libfuzzer, with honggfuzz not needing any changes, but just re-validating the fuzzers took many hours.

The toolchain we upgraded to contained bugs in rustc where it failed to gather enough entropy from the rng while creating tempfiles and so panicked during the build fairly consistently, only on our CI (the thread_rng in the current rand crate is basically not supposed to run out of entropy ever - it uses cpu jitter as a fallback). We had to upgrade the kernel on the CI machines to one that supported getrandom to fix that. Due to that bug I attempted to skip forward to the next nightly series, 1.33 (by this time it had been so long that 1.33 had branched to beta, where we had started the upgrade on 1.32), to check whether it had fixed the entropy bug. That toolchain though had a bug causing clippy to panic (that bug is fixed now).

Then we were ready to turn on Rust 2018 in the PR you linked, which ideally is a process of rustfix -> clippy -> rustfmt. rustfix’s output though resulted in a broken codebase, due to confusion about paths that became ambiguous in 2018. There were a number of new warnings, mostly due to the borrowchecker being more precise in 2018, presumably NLL.

All together, still not hard, but there was a lot of experimentation and false starts to get there. The final PR is from a branch tellingly named rust-2018-try3. It took me (with help from breeswish and others) over a month, though not full-time.

Admittedly, some of the challenges were due to running nightly toolchains.

Links to any upstream bugs can presumably be found on the TiKV PRs.

A side note about picking the “correct” nightly: I decided that the last nightly of a given release is probably the least risky, since newly-stable features have had as much time to bake as possible. Beta branches though do tend to have a number of fixes (and the most critical fixes), which makes me uneasy about having a product that is widely depended on by financial institutions, among others, on any nightly. I am considering building TiKV with stable compilers (or final beta) while also using unstable features if we can’t remove all TiKV’s usage of unstable features.

Cheers!

@llogiq @Manishearth can you check the linked Reddit thread and see why my post doesn’t show?


#4

It says “removed” without listing a username – this means you deleted it (maybe accidentally?)

undeleted


#5

Appears for me now!