`rustc` memory usage

2e71828 · February 12, 2021, 5:59pm

rustc is using >15GB of memory to compile my test program, and sometimes being killed by Linux's OOM killer. Are there any settings I can tweak to reduce this at the expense of compile time?

I'm doing a lot of type-level programming and the trait solver/borrow checking appears to finish relatively quickly (based on when errors show up). I suspect that the problem is somewhere in the code generation or optimization phases, but I don't know how to verify that.

(Sidenote: I expect the answer is no, but I thought I'd ask before I rent a high-memory cloud instance to compile on)

RustyYato · February 12, 2021, 6:12pm

Wow, I don't know how you're doing that. I've done quite a bit of type-level programming, and never ran into that issue (even on my 8 GB machine). For example, I've computed the layouts and certain niches of some types at the type level. I can help you optimize the type-level code if you are willing to share the repository.

To answer your questions, while I was writing that layout computer, I didn't find any way to profile the type checking in particular, or to profile the memory usage of rustc.

mbrubeck · February 12, 2021, 6:18pm

Running with cargo build -j1 to reduce the number of parallel threads/processes Cargo spawns might help, depending on where the problem occurs. Or you can set codegen-units = 1 in your Cargo profiles to reduce parallelism during codegen specifically, though I'm not sure if combining all the code into a single large CGU will actually reduce memory compared to multiple small ones.

2e71828 · February 12, 2021, 6:24pm

The library code is in the 3 repositories listed here. You'll need to manually set up local paths in each of the Cargo.tomls. It's all kind of a mess and poorly documented, unfortunately. (Repositories only work w/ git's dumb HTTP protocol, which libgit-based tools don't support).

The program that's failing is my criterion benchmarking setup that uses them, and it's not in a repository at the moment. Let me see what it'll take to get that available.

EDIT: The test program is now available here:

git clone https://2-71828.com/git/codd

RustyYato · February 12, 2021, 6:36pm

Is typenum_uuid private? I can't clone it

$ git clone https://2-71828.com/git/typenum_uuid
Cloning into 'typenum_uuid'...
fatal: repository 'https://2-71828.com/git/typenum_uuid/' not found

2e71828 · February 12, 2021, 6:38pm

Oops. It's git clone https://2-71828.com/git/typenum-uuid
`

H2CO3 · February 12, 2021, 7:32pm

I didn't look into this in any detail, but most of Rust compilation time is usually due to LLVM. I would at least suspect that this might be the case here as well. Does your computer also OOM if you omit codegen by only running cargo check, for instance?

2e71828 · February 12, 2021, 7:35pm

I suspect you’re right, given the behavior I’m seeing. I’ve left the office for the day, but I’ll try that first thing tomorrow.

2e71828 · February 13, 2021, 10:00am

cargo check is a bit better: It runs up to 14GB of resident memory, but finishes successfully. @RustyYato did a brief audit of my code and found some easy improvements I could make as well as some problematic design decisions. I haven't yet tried @mbrubeck's suggestion reducing the thread / codegen count, but everything looks effictively single-threaded right now: CPU usage is pegged at 100% on one core, with the others unused.

I think my next step is to break each of my test cases out into its own crate with a non-generic interface. Hopefully, that will let the compiler partition the work better as it will be dealing with smaller working sets.

H2CO3 · February 13, 2021, 10:09am

Alright, LLVM is not the cluprit, then, but I'm glad it helped just a little bit. I'm curious what it will be. Is there a chance you can / are willing to profile the compiler? That would likely help pinpoint the root cause, although I'm pretty sure it requires a nontrivial amount of effort.

2e71828 · February 13, 2021, 10:24am

My thesis deadlines are coming up soon, so I don't think I have time to dive into that particular rabbit-hole. Right now, my priority is to apply whatever duct tape is necessary to limp across the finishing line, and maybe I can come back to a proper investigation once the pressure's off.

2e71828 · February 19, 2021, 6:14pm

I'm still stuck on this, so any advice would be helpful. As best as I can tell, there's some kind of combinatorial explosion going on in the compiler. A single one of my test cases, minimized as much as possible, is sitting at 160GB memory usage and 2+ hours of compile time at the moment (It hasn't yet completed).

I've tried a few things over the course of the past week:

I streamlined the type-level algorithms that are getting used most, writing them by hand instead of going through the type-system lisp interpreter
I reduced the number of columns that need to be considered to 5 at most, which is an upper bound on the length of my HLists (except for those that represent Lisp code)
I tried to implement the set algebra routines in const generics instead of the type system; It mostly worked except for an ICE when I tried to integrate it into the rest of the system (part of const_evaluatable_checked threw an "Unimplemented" error)
I rented a huge-memory cloud server to try and brute-force the compilation through. (As part of this, I cleaned up the build process. Everything's in the codd repo above now.)

I've also come up with a minimal change that triggers the problems. The unit test suite that includes the code below compiles in less than 2 minutes on my laptop, but adding the commented stanza runs into the OOM killer after 7+ minutes. The same result occurs if I use BTrees instead of vectors for the underlying data, which have a more complex query planner.

I suspect the trait solver is going into a wild goose chase that it shouldn't need to. There's presumably ways to prune its search tree by strategically adding explicit bounds. Without knowing how it really works, though, I'm just making blind changes and hoping for the best.

#[test]
fn test_peer_join() {
    use crate::relation::{OpaqueRel,Insert};
    use tylisp::sexpr_val;
    
    col!{A: usize}
    col!{B: usize}
    col!{C: usize}
    col!{D: usize}
    col!{E: usize}
    
    let mut left: OpaqueRel<Vec<sexpr!{A,B}>> = Default::default();
    let mut right: OpaqueRel<Vec<sexpr!{C,B,D}>> = Default::default();
    let mut lr: OpaqueRel<Vec<sexpr!{A,B,C,D}>> = Default::default();
    let mut third: OpaqueRel<Vec<sexpr!{C,E}>> = Default::default();
    
    left.insert(sexpr_val!{A(1),B(1),C(3)}).unwrap();
    left.insert(sexpr_val!{A(2),B(2),C(7)}).unwrap();
    left.insert(sexpr_val!{A(3),B(5),C(7)}).unwrap();
    
    right.insert(sexpr_val!{D(1),B(1),C(3)}).unwrap();
    right.insert(sexpr_val!{D(2),B(2),C(7)}).unwrap();
    right.insert(sexpr_val!{D(3),B(5),C(7)}).unwrap();
    
    third.insert(sexpr_val!{C(7),E(2)}).unwrap();
    third.insert(sexpr_val!{C(3),E(3)}).unwrap();

    let join = PeerJoin::<_,_,B>{
        left: left.as_ref(),
        right: right.as_ref(),
        phantom: PhantomData
    };

/*
    let join = PeerJoin::<_,_,C>{
        left: join.as_ref(),
        right: third.as_ref(),
        phantom: PhantomData
    };
*/    
    assert_eq!(join.iter_all().count(), 3);
}

jessa0 · February 19, 2021, 6:30pm

Have you tried compiling with RUSTC_LOG=info? The output will be very noisy but looking at the just the tail of the output after it's been running for a minute might tell you what it's working on.

mbrubeck · February 19, 2021, 6:33pm

You could try setting #![recursion_limit = "32"] (or some other value that is smaller than the default of 128). This is a very blunt instrument, since it can only be set at the crate level, and might cause unrelated failures in areas of your code that legitimately need deep recursion. But you might at least get an error that provides some more information.

Beyond that, it sounds like it's time to start debugging/profiling the compiler itself.

2e71828 · February 19, 2021, 6:39pm

I had to raise the recursion limit to accommodate my fake-specialization system, which uses 128-bit typenum type ids. Replacing those with const generics worked but didn't improve anything, so I don't think they're the source of the problem.

Unfortunately, I think you're right. Can you direct me to any resources for getting started with that?

2e71828 · February 19, 2021, 6:43pm

Looks like it's primarily a bunch of rustc_trait_selection::traits::query::normalize::fold_ty entries with various type names I recognize from my code. Some of them go by really quickly, but others sit for many seconds as the most current entry.

EDIT: Now it seems to be interspersed with rustc_trait_selection::traits::codegen Cache miss: Binder(...

2e71828 · February 20, 2021, 8:28am

After thinking about the problem overnight, the next thing I’ll try is to write a type-erasing barrier that can sit between multiple nested query planners. This at least has a predictably working final result: If I keep adding more type erasure, I’ll eventually end up with a fully runtime-dispatched system.

2e71828 · February 24, 2021, 5:25pm

It looks like I'm out of the woods on this one. The main problem seems to be that I was doing too much intermediate type manipulation. In case someone faces similar problems in the future, here's a summary of the highest-impact changes that I made:

Every stage of every query had its own result type that had to be calculated, and that work couldn't be shared between queries. I moved this to be a single associated type defined by each relation. This may require fetching fields that I don't need, but I don't currently have any optimizations that could benefit from that.
Filter clauses had no tolerance for an impedance mismatch. Many of my relations are composed of smaller ones, and I had to restructure the filter types every time I dispatched work to a component relation. I relaxed the filter definition so that they can be applied to any records, even ones that don't contain columns that the filter wants to inspect. This let me re-use the top-level filter without rebuilding it.
I needed multiple query plan types for each relation because they had different trait requirements. The above changes let me replace these with an enum instead. The particular variant that gets generated by any given query is determined by associated constants, which effectively moves some of the workload from the type system to the optimizer.

These changes produced a small, but acceptable, runtime performance penalty but I'm consistently compiling everything in <5 minutes on my laptop. The compiler memory usage for each compilation unit is now under 2 GB as well.

system · May 25, 2021, 5:25pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Memory leak of rustc? help	11	2354	January 12, 2023
How to calculate memory usage? help	7	3807	February 8, 2021
Install/Compile Consuming lots of memory	4	1404	May 26, 2023
Why does rustc require 7GB of memory?	8	2417	March 8, 2020
Compile Rust dependencies independently for large projects? help	8	491	September 21, 2024

`rustc` memory usage

Related topics