I have a large-ish (230K lines) auto-generated files containing mostly data. (Anonymized source)
Attached is the -Z time-passes output.
Except for Derefer which I've figured out I can almost eliminate by doing s/&LINKS/LINKS, are there any other low-hanging fruit?
(I realize that macro expansion is probably not something I can reduce since PHF performs some heavy calculation)
Is there a way to generate this in a way that reduces, for example MIR_borrow_checking (~3s)?
Is there a way to tell rustc to skip linting completely for this file? (module_lints + lint_checking = ~4s)
misc_checking_3 sounds vague, but from what I can understand from the code it's just lint_checking in a trenchcoat.
I think there's value in representing such metadata as code, because it is very easy to consume, and I can ensure it is not duplicated in memory by compiling it into a shared object. (Or at least I could in C, still not sure if I can ensure this in Rust)
Representing this in BSON means that I have to pay O(N) (amount of processes) memory for this, which is something I am hesitantant to do, because that's one of the problems I'm trying to solve.
I do gather from your comment that we might have hit a wall WRT what we can do in Rust code, so it might be easier for me to generate C/C++ and wrap the data in Rust.
If you have N records of static data in your Rust code, that's also going to occupy O(N) space. So I don't understand what you are getting at. The data has to be stored somewhere, after all – if you scatter it between the TEXT and RODATA sections of a binary (for example), then it's still going to consume the same amount of memory, so you don't reduce memory footprint, you are simply putting it elsewhere.
Also, using a real database or a serialization format would allow you to compress the data, and only incrementally decompress it at runtime. That's not possible with literal code.
No, not at all, and C and C++ compilation is slow, too. I answered what I answered because I think it would be a much more fruitful idea to use the right tool for the job instead of trying to micro-optimize something that wasn't meant to be used for this purpose.
FWIW, recently I tried to do something similar, in that I generate a table of data and wanted to compile that to rust code, so I could efficiently do lookups.
A generated file of about 60k SLOC took forever to compile, so I quickly gave up on that idea. Now I just serialize the data using serde and some renaming tricks to compress type and field names, and it's working well enough for my purposes.
Sorry, I wasn't clear.
My plan is to put it on a dylib, counting on it to be placed on pages that will be shared with multiple processes.
If I serialize this to BSON (as an example), I could also assume that the serialized BSON is shared, but I will have to pay per-process for deserializing it.
Both deseralization and DB access will introduce latency and extra allocations I would like to avoid.
Compression is a neat idea, but the data is not singularly big (the SO generated is only a single digit MiB), so compressing it after putting it in a DB might not be that beneficial for me.
My main concerns are making sure latency is close to nothing, and that the cost I'll have to pay, memory-wise, is not going to be affected by the amount of processes using this metadata, whicy is expected to be large.
It's possible that I'm exaggerating; but to me, turning something that is today an array indexing operation into a context switch sounds like something that I might not want to pay.
I mentioned C/C++ simply because we already have similarly sized metadata SOs that are faster to compile.
I.e. it's already a "proven" concept and something that's very easy for us to consume.
(Not insinuating that C++ is faster to compile in general, btw, just referring to this specific use case)
I do appreciate trying to get me out of this box :),
It's not impossible that when we have more usage data and benchmarks that we'll implement one of your suggestions, but I'm going to insist that at least for now it's not an XY problem.
When I did mostly C programming, I'd occasionally pull some linker tricks for this sort of thing: I'd write my table in #[repr(C)] format to a file, and then use objcopy to make it a .o file with one exported symbol. My code could then access it like a normal array via extern static.
I don't see any reason why you couldn't pull a similar trick with Rust: You can use include_bytes!() to import the data file and bytemuck to cast it to your real data structure, but you'll need to be careful about memory alignment (and maybe endianness).
There is already non-negligible latency in loading and dynamically linking a shared library (which is alsoO(size of library)). You can't achieve near zero-latency dynamic loading.
You can't guarantee that with dynamic libraries, either. They may or may not actually be shared by the OS; and data is especially problematic, because code is always assumed to be immutable, but data isn't.
Also, I sense some incoherence in your argument. First you assert that the data is not big, but then you argue that it's too big so you don't want to copy it. You'll have to decide which one it is.
Dynamically loading happens once on process startup. I don't mind paying for it.
Accessing the data happens all the time during the lifetime of the process.
It's not that I don't want to pay anything, I am aware that bytes do not appear out of thin air.
I simply want to pay for them in a specific place and time.
The data set itself is not huge, my problem is its duplication.
I want to deduplicate using SO (which I gather is no guarenteed).
You suggest to deduplicate using a DB.
Either way the entire set of data appears only once in a system, so in this specific scenario, compressing it is unnecessary.
I don't see it as an incoherency; maybe just a miscommunication on my part.
That's a pretty cool idea, I'll admit.
And if the target arch was a given, I'd seriously consider it.
But given the need for flexibility in deployment, the downside you mention would be a real headache to deal with.
In addition, when I say the current solution is good enough, instantiation of the data set is a matter of milliseconds.
But in your previous post, you said that it worries you because there will be many processes.
I apologize, I just don't see the contradiction.
I have many long-running processes.
Assuming page-sharing works (again, I now understand I cannot count on it), I do not mind startup taking a bit more time.
You can do the same thing with deserialization: just put the deserialization code in main(), or in a static Lazy<YourDataStructure>.
Which is definitely an option that is easy to implement, but doesn't satisfy any of the concerns I am trying to address.
If anything, I'm interested in what ways "You can't guarantee that with dynamic libraries, either".
Will my data won't end up in RODATA, or do I simply not have the guarantee that RODATA isn't shared?