What is the current rust tooling for data analysis?

I plan to write software that will do data analysis on scraped/public data, but I definitely don't want to go "big data" - it will easily fit in a single server.

I don't want to go with a SQL database, because the schema will change relatively frequently and will be relatively heterogenous and/or semi/structured (but I want to have it strictly typed). Also I don't want to have to define/maintain the schema in all languages if possible, which is hard with SQL. And I don't have any requirements to change live data (perfectly acceptable to deploy the data in read-only mode with the service).

What format would you use to store tabular data in files if you need to balance between type safety, cross-language (still rust is primary) interoperability and ease of (schema) migration?

The modus operandi will be to "somehow" put the data in that format (via Python for example, but the idea is to use whatever is most convenient at the time), then have an analytics engine in Rust that allows the user to generate ad-hoc statistics.

I'm thinking something like Apache Avro, but I wonder if something has gained more traction in the rust community.

Will be happy to read even the wildest suggestions, as this is also a learning project :slight_smile:

How about Redis?

Hmm, that might be my misunderstanding of what redis can do, but here are my thoughts "against" redis in this case:

  1. I will be able to fit the data in memory initially, but later it might be a problem
  2. Redis doesn't solve the requirement for typed data and especially the requirement that I don't have to maintain the schema in all languages
  3. It is not suitable for "tabular" data and queries that scan the whole dataset, I will have to do that by fetching everything and if I do that, I don't really see why use redis at all.
  4. I don't really need to edit the data at runtime and the added server that I need to maintain is just an added drawback with no benefits.
1 Like