Based on my other post of rust based ETL activities (CSV import to postgres), i am now wondering if I can spawn a few threading and insert the data simultaneously. I don’t know how at this point. Here is my idea is pseudocode. I would appreciate some feedback. There could be data structures that do this already.
I have read about using r2d2 as a connection pooler so the connection can be shared across threads.
Create a Vector of "Record"
Load all the data into that Vector. This is the tough part. The file is 700MB.
create x slices of that vector and send each of those to a different thread
does that work?
Another option could be to pass the data from the csv reader to ‘inserter’ threads. Since the csv reader is faster that the inserts, this would seem to be a reasonable option as well and it wouldn’t require that the entire file end up in memory.