Long compile times for a vector with 50K u64 values

Hi guys, I'm trying to write a program where a large vector of u64's, with about 53000 elements of predefined values, needs to be stored and used. However, the program takes a long time to compile, about 24 seconds on my Ryzen 2700X machine. It was taking 47 seconds previously if I tried assigning the values after creating the matrices Vector.

I felt these compile times are rather long for such a program, so I wanted to ask your opinion. I've created a semi-barebones repository here. Just trying to compile this should take a while.

So I wanted to ask:

  1. Is this compile times expected for this given scenario?
  2. If not, is there any way to speed up the compilation? For example, the same code in C++ takes much less time to compile.
  3. Should this be brought up with the Rust language team?

Thanks in advance!

You may also want to post this on the internals forum. A lot more people working on rustc itself tend to hang out there.

I'm guessing this is because rustc needs to parse the long array literal and keep track of a non-trivial amount of extra information (e.g. span). This isn't great for memory usage or processing and a problem I encountered a lot when making include_dir!(). One possible solution would be to save all the numbers into a text file then use include_str!() to embed the contents of the text file into your library at compile time.

You'd need to parse the text at runtime, but I'm guessing you can use something like lazy_static!() to make sure that only needs to be done once.

lazy_static! {
  static ref MY_NUMBERS: Vec<u64> = {
    let text = include_str!("my_numbers.txt");
    text.lines()
      .map(|line| u64::from_str(line).expect("Parsing should never fail"))
      .collect()
  };
}

It's not an overly satisfying answer, but this workaround should help with compile times.

I have no idea but I a suspicion that the macro building the array is taking a long time to generate all the initialization code.

vec![
    0x8000000000000,
    0x4000000000000,
...
]

Is that matrix immutable in your finished code?

I have a program that uses a table of 40,000 prime numbers. I generate it as a global static array at build time. It takes no time at all for my build.rs to generate those primes. The resulting code gets compiled in no time.

Rust build scripts: Build Scripts - The Cargo Book

My example: https://github.com/ZiCog/tatami-rust

Might help presuming you have code to generate all those values.

Thanks for the reply!

Yes, that matrix is immutable in my code. In my case, those are precomputed values used in general while generating Sobol low-discrepancy sequences, so I don't even need to generate them; they're just there. So I wonder what's wrong...

Thanks! I see. Maybe I'll go with storing the values some file and reading from there if all else fails.

If you can read them from a file you can have the build.rs script do that at compile time and create a static array of values which then gets compiled into your code. I reckon that will be pretty fast and then your executable does not need to carry that data file around.

You can also use include_str or include_bytes. You then either parse them or if you store them in binary form simply use a transmute from &[u8] to &[u64] (or whatever).

1 Like

That's the equivalent of what he's doing at the moment. Although instead of creating the static array of values using a build script, they're part of a file that already exists. The problem is that rustc still needs to parse and compile that static array (which would be a normal Rust file with a static).

I don't see how that would work... If the array is created at compile time there's no way to use it from your program unless the static data is bundled in your executable somehow. You could read the file from the filesystem entirely at runtime, but that means you need to distribute your data file alongside the binary.

The macro is not at fault. First of all, it doesn't happen in debug mode.

    Finished dev [unoptimized + debuginfo] target(s) in 1.15s

Using cargo rustc --release -- -Z time-passes shows the following:

   Compiling large_vec_allocation v0.1.0 (/home/skade/Code/rust/large_vec_allocation_repro)
  time: 0.026; rss: 64MB	parsing
  time: 0.000; rss: 64MB	attributes injection
  time: 0.000; rss: 64MB	recursion limit
  time: 0.000; rss: 64MB	plugin loading
  time: 0.000; rss: 65MB	plugin registration
  time: 0.003; rss: 65MB	pre-AST-expansion lint checks
  time: 0.000; rss: 65MB	crate injection
    time: 0.181; rss: 135MB	expand crate
    time: 0.000; rss: 135MB	check unused macros
  time: 0.181; rss: 135MB	expansion
  time: 0.000; rss: 135MB	maybe building test harness
  time: 0.000; rss: 135MB	AST validation
  time: 0.000; rss: 135MB	maybe creating a macro crate
  time: 0.001; rss: 138MB	name resolution
  time: 0.000; rss: 138MB	complete gated feature checking
  time: 0.002; rss: 138MB	lowering AST -> HIR
  time: 0.002; rss: 138MB	early lint checks
    time: 0.003; rss: 142MB	validate HIR map
  time: 0.015; rss: 142MB	indexing HIR
  time: 0.000; rss: 142MB	load query result cache
  time: 0.000; rss: 144MB	dep graph tcx init
    time: 0.000; rss: 144MB	looking for entry point
    time: 0.000; rss: 144MB	looking for plugin registrar
    time: 0.000; rss: 144MB	looking for derive registrar
  time: 0.001; rss: 144MB	misc checking 1
  time: 0.001; rss: 147MB	type collecting
  time: 0.000; rss: 147MB	impl wf inference
    time: 0.000; rss: 154MB	unsafety checking
    time: 0.000; rss: 154MB	orphan checking
  time: 0.002; rss: 154MB	coherence checking
  time: 0.003; rss: 154MB	wf checking
  time: 0.001; rss: 154MB	item-types checking
  time: 0.096; rss: 167MB	item-bodies checking
    time: 0.002; rss: 169MB	match checking
warning: unused variable: `x`
 --> src/main.rs:8:9
  |
8 |     let x : Sobol = Sobol::init();
  |         ^ help: consider prefixing with an underscore: `_x`
  |
  = note: `#[warn(unused_variables)]` on by default

    time: 0.002; rss: 169MB	liveness checking + intrinsic checking
  time: 0.004; rss: 169MB	misc checking 2
  time: 0.060; rss: 175MB	MIR borrow checking
  time: 0.000; rss: 175MB	dumping Chalk-like clauses
  time: 0.000; rss: 175MB	MIR effect checking
  time: 0.000; rss: 175MB	layout testing
    time: 0.000; rss: 175MB	privacy access levels
    time: 0.000; rss: 175MB	private in public
    time: 0.001; rss: 175MB	death checking
    time: 0.000; rss: 175MB	unused lib feature checking
      time: 0.001; rss: 175MB	crate lints
      time: 0.007; rss: 179MB	module lints
    time: 0.008; rss: 179MB	lint checking
    time: 0.003; rss: 179MB	privacy checking modules
  time: 0.013; rss: 179MB	misc checking 3
  time: 0.000; rss: 179MB	metadata encoding and writing
      time: 0.000; rss: 183MB	collecting roots
      time: 0.013; rss: 187MB	collecting mono items
    time: 0.014; rss: 187MB	monomorphization collection
    time: 0.000; rss: 187MB	codegen unit partitioning
    time: 0.000; rss: 187MB	write allocator module
    time: 0.001; rss: 193MB	llvm function passes [large_vec_allocation.3xnd7bo9-cgu.5]
    time: 0.001; rss: 193MB	llvm function passes [large_vec_allocation.3xnd7bo9-cgu.4]
    time: 0.001; rss: 195MB	llvm function passes [large_vec_allocation.3xnd7bo9-cgu.2]
    time: 0.003; rss: 195MB	llvm module passes [large_vec_allocation.3xnd7bo9-cgu.2]
    time: 0.006; rss: 196MB	llvm module passes [large_vec_allocation.3xnd7bo9-cgu.4]
    time: 0.009; rss: 197MB	llvm module passes [large_vec_allocation.3xnd7bo9-cgu.5]
    time: 0.057; rss: 210MB	codegen to LLVM IR
    time: 0.000; rss: 210MB	assert dep graph
    time: 0.000; rss: 210MB	llvm function passes [large_vec_allocation.3xnd7bo9-cgu.6]
    time: 0.000; rss: 210MB	serialize dep graph
  time: 0.073; rss: 210MB	codegen
    time: 0.000; rss: 210MB	llvm function passes [large_vec_allocation.3xnd7bo9-cgu.1]
    time: 0.001; rss: 211MB	llvm function passes [large_vec_allocation.3xnd7bo9-cgu.0]
    time: 0.002; rss: 210MB	llvm module passes [large_vec_allocation.3xnd7bo9-cgu.6]
    time: 0.001; rss: 188MB	llvm module passes [large_vec_allocation.3xnd7bo9-cgu.0]
    time: 0.003; rss: 183MB	llvm module passes [large_vec_allocation.3xnd7bo9-cgu.1]
    time: 0.062; rss: 193MB	llvm function passes [large_vec_allocation.3xnd7bo9-cgu.3]
    time: 2.423; rss: 302MB	llvm module passes [large_vec_allocation.3xnd7bo9-cgu.3]
    time: 0.001; rss: 301MB	LTO passes
    time: 0.002; rss: 301MB	LTO passes
    time: 0.002; rss: 302MB	LTO passes
    time: 0.002; rss: 302MB	LTO passes
    time: 0.002; rss: 302MB	LTO passes
    time: 0.002; rss: 303MB	codegen passes [large_vec_allocation.3xnd7bo9-cgu.0]
    time: 0.003; rss: 304MB	codegen passes [large_vec_allocation.3xnd7bo9-cgu.4]
    time: 0.002; rss: 304MB	codegen passes [large_vec_allocation.3xnd7bo9-cgu.5]
    time: 0.002; rss: 305MB	codegen passes [large_vec_allocation.3xnd7bo9-cgu.2]
    time: 0.004; rss: 305MB	LTO passes
    time: 0.003; rss: 305MB	codegen passes [large_vec_allocation.3xnd7bo9-cgu.1]
    time: 0.003; rss: 306MB	codegen passes [large_vec_allocation.3xnd7bo9-cgu.6]
    time: 2.894; rss: 352MB	LTO passes
    time: 22.574; rss: 1066MB	codegen passes [large_vec_allocation.3xnd7bo9-cgu.3]
  time: 28.314; rss: 1066MB	LLVM passes
  time: 0.000; rss: 1066MB	serialize work products
    time: 0.182; rss: 1066MB	running linker
  time: 0.183; rss: 1066MB	linking
time: 28.948; rss: 1045MB		total
    Finished release [optimized] target(s) in 29.10s

Almost all compile time is spent in the final codegen pass in LLVM.

1 Like

I'm not sure how equivalent it is. My huge static array does not need any macro to expansion. My example shows that parsing and compiling a huge static array is fast enough to not notice it is happening.

It works by way of Rusts "include" capability. Include statements in Rust do like they do in C. In my case the array is generated into a file by build.rs and then included into the right place when the actual compilation tales place.

See "Build Scripts" in the Rust documentation and my example. Links above.

Interesting. Cancel what I said above then. Sounds like something buggy in the compiler.

Indeed, it seems that Debug does not have this problem. In that case, does it make sense to raise an issue in Github?

This is a bug: Extremely slow optimizer performance when including large array of strings · Issue #39352 · rust-lang/rust · GitHub

5 Likes

For completeness' sake, I think this should work:

const DEFAULT_MATRICES: &[u64] = &[
    ...
];
impl Default for Matrices {
    fn default() -> Self {
        Matrices {
            num_dimensions: 1024,
            size: 52,
            matrices: DEFAULT_MATRICES.to_vec(),
        }
    }
}

Note that this is assuming you need to mutate the matrices field afterwards, if it's just a constant then you shouldn't even have a field for it in the Matrices struct.

Also, matrices.get(i).unwrap() can be replaced with matrices[i] (the latter has more information in the panic if it does go out of bounds, but it's also simpler).

1 Like

Note that in this case, you are required to ensure that the array is u64-aligned and also handle the little-endian/big-endian cases.

1 Like

Thanks a lot for the replies, everyone!

Actually, I had totally forgotten about the const modifier, and implementing @eddyb's code solved the issue.

While my problem is solved, the LLVM issue still remains, so I think it might make more sense to not close this as solution just yet.

1 Like

I'd say it's only not a solution if you really have runtime data in that vec![...].

The only reason this has terrible compile times is because it's similar to writing thousands of vec.push(...) by hand, in its semantics, just faster to execute at runtime.

While we could promote large constants in the same way we do for borrows, we don't always know sizes in MIR, so we'd need some heuristics instead for the general case.

But maybe @oli_obk and @spastorino / @ecstatic-morse can take a look at this (come to think of it, const-prop could technically evaluate this ahead of time and keep it in a ty::Const, because the type is fully known, in this very specific case).

1 Like

It's not a focus right now, but sufficiently advanced const prop will probably figure this out at some point. We have a few intermediate steps that we need to resolve first though

1 Like

Thanks for your input! Looking forward to the day this lands. :slight_smile:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.