Soft question: scaling codebase 50k loc -> 500k loc

That's a tricky to answer question: I would say that large refactors of internals are possible, but painful, but that the overall architecture is pretty-much set in stone. But if you ask me where's a boundary between a large refactor and re-architectureing, i would say "well, exactly there where you can't do changes anymore".

So let me give you some principles which could no longer be changed in the current code, but which could have been coded differently from the beginning:

  1. single-version principle --- rust-analyzer always has a single snapshot of code at a time, time is modeled by changing this snapshot wholesale. That's not the only way to do it. RLS treated code as mutable, Roslyn allows holding several different snapshots at the same time as everything is immutable.

  2. lazy-analysis principle --- rust-analyzer is secretly rust-avoid-analysis-at-any cost, it intentionally knows only subset of things about codebase. So, eg, when you do "find usages" in rust-analyzer, what happens is not that rust-analyzer looks into use-def chains it got while "compiling" the code, but rather it runs a heuristic text-based search (Find Usages), and then uses lazy analysis to prune out false positives (that's why searching for new is way slower than searching for frobnicate).

    An alternative here is for a language server to maintain a fully complete view of the code base (something which might be desired to push all the way towards incremental binary patching and live reload)

  3. As-if-analysis is complete principle --- the laziness is abstracted away. All IDE features are building on top of a model which looks as if there's a completely compiled version of a snapshot of the source code is available. An alternative would be more explicit phasing in the IDE parts, where you don't just get the info, but schedule specific computations to run.

In contrast, here are some tactical things which are feasible to change:

  • migrate typecheckper to a library shared with rustc
  • upgrade salsa from "sea of Arcs" to "array with indexes" version
  • maybe change cancellation from unwinding to explicit results or async, but not removing support for cancelation altogether

While we are at it, a related story about an org chart:

This was a huge architectural bug in rust-analyzer. It was fixed recetly through heroic work of @Veykril, but, as you can see, it took us years to do something about the thing which is very clearly wrong, and wrong in a viral way (everything building on top of this wrong abstraction is also wrong).

But what's most curious here is the social aspect. The first order technical story here is that @matklad just didn't get how macros in Rust actually work back when the infra for macro expansion was coded for rust-analyzer. I implemented what I imagined to be the way macros work, but that was incorrect, and it took me some years to recognize that. Which is OK --- compilers are hard, I am of limited smartness, mistakes are being made all the time, 64k should be enough for everybody.

What is really curious is that I identified "I don't know how macro expansion works" as a core risk from the very beginning. You can read about that in the very first paragraph that announced the thing that was to become rust-analyzer: RFC: libsyntax2.0 by matklad · Pull Request #2256 · rust-lang/rfcs · GitHub. I also recall specifically trying to get at this question of macro expansion at the second rust all-hands in Berlin (really, Rust was able to fit in a single (big) room in those days!). But, like, it is literally impossible to transfer the knowledge between the two code-bases (rusct and rust-analyzer), unless there's someone who works to a large capacity in both. Both sides might be very much willing to talk shop and share all the knowledge they have, but the knowledge doesn't actually register until you go and start solving the problems yourself.

EDIT: to clarify, yes, all those aspects (and many other similar) were decided within the first 10k lines. I would even say before the first real line was written --- rust-analyzer is pretty much an execution of a design I arrived at somewhere in 2016 I guess? The macros again make an interesting study --- the actual code for macro expansion was written relatively late, I think past the 10k lines mark, certainly after basic type inference. But those 10k lines were determined, in a significant part, by the macro expansion code that was yet to be written!

8 Likes