Brainstorming ECS for Compilers

(Yes, this is very vauge, because I've just had a vague idea brewing and haven't had time to spend on trying implementation yet.)

If you're not exactly certain what an ECS is, check out specs' overview and the RustConf 2018 closing keynote.

ECS seems to be a very common pattern for modern design work, especially in Rust. It's not hard to see why; it cleanly separates state from behavior, promotes useful composition, (and works around the borrow checker).

ECS has most actively been applied in game development (isn't that where it first got somewhat big and trendy), but I've also seen it used for web server backends and UI frameworks (with varying levels of success).


Parsing itself is, unfortunately, mostly a serial problem, but it's a fast one, and one that you can easily parallelize over multiple input files (so long as you know the set of files to parse ahead of time (a feature Rust doesn't have due to how mod discovery works)).

I know rustc is (or will be) architected around a core query system. Very simply, you give it the input files and then ask questions like "what is the type of std::option::Option::Some", "is the trait crate::traits::Foo object safe", and "what is the rlib for the library crate"? (Very simplified.)


I'm wondering how/if we could apply an ECS-inspired architecture to a compiler setup. I have a vague idea that each symbol would be an Entity, and various specifics like types could be Components, and it would solve the cyclical nature of compiler logic (symbols can refer to each other, so one needs to be encoded pointing to a tbd at the very least) in a somewhat structured manner.

I really don't have any concrete ideas beyond "ECS backed compiler framework". It just feels like it could work, but I'm just not sure exactly how. My draft of how my toy compiler's references between symbols feels like it'd be a pseudo-ECS store addressed by fully-qualified-name even if I don't use specs.

1 Like

I recall this has been discussed on discord, but the only mention I can find right now is https://discordapp.com/channels/273534239310479360/274215136414400513/483300666475806750

I thought it was cool as a potential alternative to lots of different IRs. Like if you could just attach the MIR "component" to something once it was compiled that far. Ditto for name lookups, for spans, etc

The direct Discord link isn't working for me. I just get dumped into a "No text channels" screen with Activity highlighted. (I am in the rust-lang discord server.)

I don't think it would work. Compilers tend to be created with focus on speed, but ECS, even if used commonly in gamedev (where speed is crucial), are not addressing compilers case from this POV. ECS is very good solution, when you are running many small independent calculations. In such case good ECS may plan those calculations and run them in parallel, if they are writing to different components (reading from same components in the same time is not a problem). This is what Specs, Rust ECS does. However in compilers you typically run one transformation on entire code representation to create another representation, for example first you are parsing to some token vector, then you are creating AST from it, next you are transforming AST to form, on which you can perform checks and optimizations (which are performed after that), then the IM code, another optimizations, and finally binary code. Most of those tasks are very difficult to parallelize (eg. parsing is almost impossible to do in parallel, but what you can do is to parse multiply files in one time), and also its rather chain of computation, so you will not benefit at all from ECS architecture.

I am not a game developer, but it seems to me that Rust community slightly misuses the term ECS.

There are several ideas on which ECS builds, one is simple and not ECS specific, the other is what makes a ECS ECS.

The first idea is this: if you store Stuff inside a Vec<Stuff> and use struct StufId(usize) as a "handle" to the actual Stuff, you solve a lot of memory management problems (for example, you can have "cycles" of Stuff) and you gain a lot of flexibility (you can use StuffId as a key in a hashmap). I'd argue that this pattern is not what is called ECS: for example, in the Rust conf talk, we have a PlayerId since the very beginning, long before we started to refactor the code to use ECS, just because that is how you don't get use after free in C++.

The second idea (which I think is more specific to ECS than pervasive indexes) is replacing inheritance with composition: if you have Physics, Graphics, and Input aspects of you objects, you either have to define 8 different classes to represent all combinations. Instead, once can define a free-form composition with optional components:

struct GameObject {
    physics: Option<Physics>,
    graphics: Option<Graphics>,
    input: Option<Input>,
}

The third idea is AOS/SOA optimization (it is purely an optimization: it doesn't affect programming model if you are using indices).

In Rust we don't have inheritance to start with, so the core second idea is not readily applicable, unless you are trying to do something inheritance-like.

I don't think compilers have a lot of open-world inheritance, on the contrary, they use closed-world enums instead, and replacing that with composition would feel like a net negative.

8 Likes

For model of the compiler, I think adapting GitHub - salsa-rs/salsa: A generic framework for on-demand, incrementalized computation. Inspired by adapton, glimmer, and rustc's query system. as a data store is a smart move.

This data base/data store approach means that most your domain objects are flat, have an identity, refer to over objects via id and queries.

This gives you memoization and on-demand for free, which are vital if you want to use compiler inside the IDE

1 Like