Advice on how to embed data in a library/executable

I need ~100k rows by ~10 columns of (unchanging) data to correct/classify a stream of incoming records (changing). What is the best Rust way to do this, do I make a sqlite db (maybe as part of my build.rs) then embed that file into the library? Or, do I include! a csv then process that into a sqlite db in memory at the start of execution? Or, ?

What’s the size of a row of data? Is it mostly numbers and/or short strings?

What queries do you need to make? Are you looking up a row by a single column or are you doing some sort of search for best match?

It sounds like it’s probably not that much data. And if the queries are just lookups by an index or something, my inclination would just be to store it in a vec in memory and create hashmaps for any lookups.

mostly short strings, like | KX | Klaxxon | 93322 | K | ... |
Frequently i'll need to match three of the columns to get all the possible rows that meet the criteria and then do a little judging of which is the best option.

Ok, i'll go back to looking at building ~5-10 hasmaps to index it. do i just do the !include then to pull the csv into the final lib/exe?

If you want to avoid the cost of parsing the data on every invocation, you could use build.rs to parse it at build time into repr(C) structs, and then use include_bytes!:

2 Likes

Just to follow up based on what @2e71828 says. If you did use build.rs to include_bytes you could use it in the following way. It still builds the maps at run time, but you wouldn't be able to avoid that with the built in HashMap I don't think. If you want everything at build time, it looks like phf supports compile time hash maps as well.

Playground Link

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.