How to define and instantiate struct with HashMap fields

Hello,
I'm trying to write values from a polars dataframe into a struct where each field is a HashMap that has the dataframe "index" as keys (I'm doing this for quick accessibility: does it actually make accessing/updating a value by row and column name faster or am I better-off just keeping the dataframe?)

This is what I came up with:

#[derive(Debug)]
struct MyStruct<'a> {
    col1: HashMap<&'a str, u32>,
    col2: HashMap<&'a str, String>,
    other_fill_later: HashMap<&'a str, u32>,
};

Question: does it make sense to have &'a str as key types?
I am not planning to change the keys but it's not clear how it works: I've read that " &str is a reference to some data, and it is a read-only and cannot be modified. It is assigned at compile-time, and can be used to refer to a string literal or a portion of a string.". How can I possibly assign keys that are unknown at compile time then?

In main this is what I came up with (where df is a rust-polars lazy dataframe):

let ids = df.column("ID")?.utf8()?;
let mut my_struc = MyStruct {
    col1: HashMap::from(pool_ids.zip(df.column("col1")?.u32()?).collect()),
    col2: HashMap::from(pool_ids.zip(df.column("col2")?.utf8()?).collect()),
    other_fill_later: HashMap::from(pool_ids.zip(vec![0; num_pools]).collect()),
};
  • This doesn't compile: I get error[E0599]: &ChunkedArray is not an iterator, which is true, but reading these links
    polars::prelude::zip - Rust
    Zip in polars::export::rayon::iter - Rust
    I thought I could zip two types implementing the ParallelIterator trait.. Can anyone help me to fix the error?

  • Even if the above is working, this method of initializing a struct feels hard to maintain and cumbersome, I have to keep track of all the struct fields and insert them manually, while ideally I just need a function that fills the fields by matching the dataframe column names.. Is there a better alternative?

Your data structure is probably much slower for sequential access than the polars DataFrame because you scatter your data randomly in memory. My advise is to use polars unless you are limited by it and understand why.

" &str is a reference to some data, and it is a read-only and cannot be modified. It is assigned at compile-time, and can be used to refer to a string literal or a portion of a string."

Where did you read that? The general statement is wrong i.e. only true with additional conditions. The only thing that is actually assigned at compile time in rust are static and arguably const "variables".

This doesn't compile: I get error[E0599]: &ChunkedArray is not an iterator

But it implements IntoIterator. ids.into_iter().zip(...) or for parallel processing ids.into_par_iter().zip(...) should work.

Thanks for the feedback!

We strongly recommend selecting data with expressions for almost all use cases. Square bracket indexing is perhaps useful when doing exploratory data analysis in a terminal or notebook when you just want a quick look at a subset of data.

It's not clear if this applies to Rust as well or just the Python API, but I thought that a HashMap would be faster for my use case: access/modify one scalar value that is indexed by "id" (a string in the "col_id" column of the dataframe) and "colname" (the column name)... I don't need sequential access and extracting a scalar value from a polars dataframe seemed cumbersome.

  • I tried into_par_iter as follows:
let ids = df.column("ID")?.utf8()?;
let col1 = HashMap::from(ids.into_par_iter().zip(df.column("column1")?.utf8()?.par_iter()).collect());

but I get the following error:

error[E0599]: no method named `into_par_iter` found for struct `ChunkedArray` in the current scope
  --> src/main.rs:97:31
   |
97 |     let col1 = HashMap::from(ids.into_par_iter().zip(df.column("column1")?.utf8()?.par_iter()).collect());
   |                               ^^^^^^^^^^^^^ help: there is a method with a similar name: `par_iter`
  • Could you suggest a thorough explanation of &str vs str vs String? I've read the Rust book and googled around but I found all explanations a bit confusing

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.