Generate code, vec, array, or hash from file on compile time


#1

Hi, I have a file that have such triplets
name, id, country_id
Actually the ordering, the separator , whole structure can be anything I wan’t, csv, json, ehatever.

It’s gonna be rather big file with tens of thousand records. What I would like to do is to convert on compile_time it to some data structure, array, Vector, HashMap or anything, that will let me extract in run time id and country_id based on name.

I know I can simply create a rs with proper structure like

fn my_data() -> HashMap<String, (int, int)> {
  mut data = HashMap::new();
  data.insert("Some_string", (1, 1));
  ...
  ...
  data
}

But I wonder is there any better way? Maybe more idiomatic? Maybe smarter?

I don’t want to keep this data in any database, or external file to be parsed, because it changes very, very rarely, and what really matters for me is the speed of lookup. I simply want to convert some other data based on that data generated above and make sure, that country_id for given input is correct etc.

Thank you for any insight you may give me :slight_smile:


#2

You can use build.rs and create file with such content for example:

static ARRAY: [u32; X] = [...];

and then use include! macros to include generated code into your main code.


#3

What is your access pattern for the data? Do you iterate and batch process it or do you need key lookup—if yes what are the keys? Would you need to access the text representation of the keys?


#4

Nope, If i get int’s as input, all’s great and dandy… well almost. If i get string as input, I have to look it up and get corresponding id. The twist comes with country_id, i must also check if for given id, or name, corresponding id is exactly the same as some other arbitrary data, because you know, you can’t trust data that client uploads :wink:

Hmm… i didn’t even read before about build.rs :slight_smile: Thanks for pointing me in that direction, as i thought about writing something in python to transform my “table” to rust code. I’ll check what can I do with build.rs tomorrow :slight_smile:


#5

Another option (if you only care about a HashMap from &str to a tuple) is either phf or static_map. Which is faster depends on a few factors, but in my testing phf starts beating static_map for when you have about 2000 entries, otherwise static_map tends to come out on top.


#6

Thank you very much, that phf looks very promising as there is going to be rather tens of thousands, maybe even more entries :slight_smile:


#7

Yeah I would definitely go with phf there.


#8

One problem with the .rs extension appears to be that it is interpreted as a URL, and then sends you to a serbian web site. :slight_smile:


#9

I used phf like @cbreeden suggested, and it works wonderfully :slight_smile: I didn’t benchmark it, because it’s fast enough for me based on observations :wink: