[Learner] - Reading a CSV to a variable and passing it on

First and foremost - I apologize. I'm a python guy and this is the first time I've used a system level language (and Jesus Christ its hard - Respect to all of you). Hope I'm in the right place, if not, please kindly point me in the direction of the newbie forums :slight_smile:

I'm currently playing around with a Terminal UI library - I figure that tinkering around is the best way to learn.

The examples in that library all have hard coded data in the format of:
(Also my desired format)

const LOGS: [(&str, &str); 2] = [
 ("Event1", "INFO"),
 ("Event2", "INFO")]

My first goal/aim is to implement a function to read a CSV, assign it to a variable and then feed it into "LOGS" instead of the hardcoded values.

Now due to my years of python, I'm stuck in a way of happy go lucky thinking... where I can easily read a CSV file in and then pass it to a variable X that something else can use - Not that simple..

What I have so far (what I found on stackoverflow and modified):

extern crate csv;

use std::error::Error;
use csv::StringRecord;

#[derive(Debug)]
struct DataFrame {
   name: Vec<String>,
 }


impl DataFrame {

    fn new() -> DataFrame {
        DataFrame {
            name: Vec::new(),
        }
     }

     fn read_csv(filepath: &str, has_headers: bool) -> DataFrame {
         // Open file
         let file = std::fs::File::open(filepath).unwrap();
         let mut rdr = csv::ReaderBuilder::new()
            .has_headers(has_headers)
            .from_reader(file);

         let mut data_frame = DataFrame::new();

         // push all the records
         for result in rdr.records().into_iter() {
            let record = result.unwrap();
            data_frame.push(&record);
         }
         return data_frame;
      }

      fn push(&mut self, row: &csv::StringRecord) {
          // get name
          self.name.push(row[0].to_string());
      }
}


fn main() {
   let data = DataFrame::read_csv("./data.csv", true);

    println!("{:?}", data)
}

With the data csv like so:

Name,
Please,
Work,

I get an output:

>DataFrame { name: ["Please", "Work"] }

So I'm kinda half way there...

This is where I'm stuck - Questions:
1.) How do I manipulate the read in to match my desired format? (I feel like its something to do with the struct).
2.) Is it possible to then pass it to a variable like I would in python e.g. x = someFunction(./filename) ?
3.) Lets assume I have read the csv in and its passed into a variable and now I can assign it to 'LOGS' - The 'LOGS' variable(constant) is hardcoded for 2 values. How do I make it dynamic to accept X records from my csv?

Because you're reading these dynamically, you'll want a Vec<(String, String)> or maybe a Vec<(String, &'static str)>. There's actually a lot to cover in just this change, given where you are at in learning Rust.

Expand this part for some discussion of the major points

Vec<T> vs arrays ([T; N]); slices

A vector Vec<T> is like a Python list in that you can dynamically change it, add things to it, remove things from it, and so on. Everything it contains has to be the same type. T is the type that it contains. You're already using one to read in your names.

An array [T; N] is similar, except it has a fixed size (some compile-time constant, N). You don't know the size at compile time (and even if you did, you don't know the data).

It's possible to borrow a slice ([T]) from both a Vec or an array, and a lot of the functionality of the two is actually built on slices. So even though Vec<T> and [T; N] look pretty different, they're really pretty similar. When you use slices in variables, they have to be behind references (i.e. borrowed), like &[T] or &mut [T]. The size is not part of the type, unlike arrays, but you still cannot push in new elements or pop out existing ones without replacing them. (You can take a sub-slice to get a shorter slice.)

String vs str

A String is analogous to a Vec<T> in that it's an owned, dynamic data type. And a &str is analogous to a slice in that it's used behind references, you can't push onto the end of it, etc. (In fact, Strings and strs are Vecs and slices underneath.)

&'static str

Static references (&'static) are special in that, unlike most references, they can last "forever" (until your program exits). This is typically because they're referencing data that is baked into the program. The most common place you'll see this is with literal string values that are part of your program: "whatever" is a &'static str. This is what's going on with the LOGS constant -- all the values are baked into the program as &strs. Your name values are not baked in, so you need String. (You could use &'static str for the log levels since you know those at compile time -- "INFO", "ERROR", etc.)

You can have static slices too, for example:

// Note that `LOGS` no longer has a length as part of the type
// (The unnamed array does, but LOGS is now a static reference to
// a slice)
const LOGS: &[(&str, &str)] = &[("Event1", "INFO"), ("Event2", "INFO")];

Making your Vec

OK, now to get back to your question.

You'll need to take your Vec<String> in DataFrame and make a new Vec of tuples instead. Let's say you wanted everything to be log level INFO, you could do

let mut logs = Vec::new();
for name in &data.name {
    // You could look at `name` and choose different log levels
    // based on the name, etc, here.
    logs.push((name.clone(), "INFO"));
}

You could also do something similar when reading the data in instead -- that is, instead of saving the Vec<String>, store a Vec<(String, &'static str)> from the start. The best approach depends on what your program does, naturally.

Sure. Or store it in your struct, etc.

This is mostly covered in the details above, but basically you want a Vec instead of an array. You're creating it at run-time and you don't know how big it is. Just like you already have a Vec<String>, you'll want a Vec of tuples.

It won't be a constant since you're doing this at run time, it will just be a variable. So conventionally, it won't be called LOGS either, you should call it logs. (I'm not covering globals in Rust here and recommend not doing it for now, except for static or const values.)

3 Likes

Hi Quinedot, really appreciate the detailed response!

I tried adapting the script to include both entries into a tuple e.g. (first name, last name). However the closest I could get is the following format:

Data:

firstname,lastname
first1,last1
first2,last2

Output:

[("first1", ["last1", "last2"]), ("first2", ["last1", "last2"])]

Adaptation::

extern crate csv;

use std::error::Error;
use csv::StringRecord;

#[derive(Debug)]
struct DataFrame {
   name: Vec<String>,
   lastname: Vec<String>,
 }


impl DataFrame {

    fn new() -> DataFrame {
        DataFrame {
            name: Vec::new(),
            lastname: Vec::new(),
        }
     }

     fn read_csv(filepath: &str, has_headers: bool) -> DataFrame {
         // Open file
         let file = std::fs::File::open(filepath).unwrap();
         let mut rdr = csv::ReaderBuilder::new()
            .has_headers(has_headers)
            .from_reader(file);

         let mut data_frame = DataFrame::new();

         // push all the records
         for result in rdr.records().into_iter() {
            let record = result.unwrap();
            data_frame.push(&record);
         }
         return data_frame;
      }

      fn push(&mut self, row: &csv::StringRecord) {
          // get name
          self.name.push(row[0].to_string());
          self.lastname.push(row[1].to_string());
      }
}


fn main() {
   let data = DataFrame::read_csv("./data.csv", true);
   let mut logs = Vec::new();
   for name in &data.name {
        logs.push((name.clone(), data.lastname.clone()));
   }

    println!("{:?}", logs)
}

I also went a different route to try 'simply' the approach:

use std::io::BufReader;
use std::io::BufRead;
use std::io;
use std::fs;


const filename: &str = "./data.csv";

fn main() -> io::Result<()> {
    let lines = file_to_vec()?;
    println!("{:?}", lines);
    Ok(())
}

fn file_to_vec() -> io::Result<Vec<String>> {
    let file_in = fs::File::open(filename)?;
    let file_reader = BufReader::new(file_in);
    Ok(file_reader.lines().filter_map(io::Result::ok).collect())
}

While this works it just reads each line as a whole entry - I imagine if you want to be able to do more complex manipulations, defining a data frame like the original code did is mandatory?

Thank you again

Your DataFrame now is storing two vectors, so that data_frame.name[7] and data_frame.lastname[7] both came from the same line in the CSV file, for example. Then when you're making your tuples, you do

for name in &data.name {
    logs.push((name.clone(), data.lastname.clone()));
}

Your loop is over the name Vec, and inside the loop you clone the individual name entry from the Vec, but also clone the entire lastname Vec (and not an individual entry from the lastname Vec).

You could change this loop to iterate over both the name and lastname Vecs:

for (name, lastname) in data.name.iter().zip(data.lastname.iter()) {
    logs.push((name.clone(), lastname.clone()));
}

(Read more about zip here.)

However, when I see name and lastname, I think "these should be kept together in their own data structure, not split up between two Vecs." So I would read each data row into it's own data structure directly -- which could be the tuples like above, or it could be a struct with named fields to carry more meaning.

Here's a refactor that does that in the playground.

1 Like

Also check the serde support of the csv crate. The serde is a de facto standard SERialization/DEserialization framework in Rust.

use serde::Deserialize;

#[derive(Debug, Deserialize)]
struct DataRow {
    firstname: String,
    lastname: String,
}

#[derive(Debug, Default)]
pub struct DataFrame {
    pub rows: Vec<DataRow>,
}

pub fn read_csv(filepath: &str, has_headers: bool) -> DataFrame {
    let file = std::fs::File::open(filepath).unwrap();
    let mut rdr = csv::Reader::from_reader(file);
    let rows: Result<Vec<_>, _> = rdr.deserialize().collect();
    let rows = rows.unwrap();
    DataFrame { rows }
}
2 Likes

Thanks for that Quine dot (that playground really helped) and Hyeonu for the Serde approach.

To keep things simple I've gone with the Serde solution and adapted that to work:

use serde::Deserialize;

#[derive(Debug, Deserialize)]
pub struct DataRow {
    firstname: String,
    lastname: String,
}

#[derive(Debug, Default)]
pub struct DataFrame {
    pub rows: Vec<DataRow>,
}

pub fn read_csv(filepath: &str, has_headers: bool) -> DataFrame {
    let file = std::fs::File::open(filepath).unwrap();
    let mut rdr = csv::Reader::from_reader(file);
    let rows: Result<Vec<_>, _> = rdr.deserialize().collect();
    let rows = rows.unwrap();
    return DataFrame { rows }
}

fn main() {
   let data = read_csv("./data.csv", true);

    println!("{:?}", data)
}

Points to note:

  • I had to define the DataRow struct as public
  • I had to remove the 'DataFrame" (what I would call a 'reference' - Dont know correct terminology) from let data = DataFrame::read_csv("./data.csv", true);
    result:
    let data = read_csv("./data.csv", true);
    Due to:
    'read_csv' function or associated item not found in DataFrame'`

To summarize so far.
1.) I can read a CSV :white_check_mark:
2.) I can configure parsed data into an organized data format :white_check_mark:
3.) Pass/Query the data from the data frame.

Now this is where my brute force tinker approach will likely definitely fall short and I'll have to revert to my udemy course.

The widget that reads the aforementioned

pub struct App<'a> {
    pub logs: StatefulList<(&'a str, &'a str)>,

const LOGS: [(&str, &str); 2] = [
 ("Event1", "INFO"),
 ("Event2", "INFO")]

Sits in an external crate called app.rs and is read in by the following code block in UI.rs:

let info_style = Style::default().fg(Color::Blue);
       ...
            let logs: Vec<ListItem> = app
                .logs
                .items
                .iter()
                .map(|&(evt, level)| {
                    let s = match level {
                        "ERROR" => error_style,
                        "CRITICAL" => critical_style,
                        "WARNING" => warning_style,
                        _ => info_style,
                    };
                    let content = vec![Spans::from(vec![
                        Span::styled(format!("{:<9}", level), s),
                        Span::raw(evt),
                    ])];
                    ListItem::new(content)
                })
                .collect();
            let logs = List::new(logs).block(Block::default().borders(Borders::ALL).title("List"));
            f.render_stateful_widget(logs, chunks[1], &mut app.logs.state);

Directory structure:

src
|__demo
|      |__app.rs
|      |__mod.rs
|      |__ui.rs
|
|__util
|      |__event.rs
|      |__mod.rs
|
|__termion_demo.rs

My understanding on how to achieve number 3:
To be able to query the dataframe, I will need to run the above solution in a main function.
Problems:
1.) Both App and UI are referenced by termion_demo which is the only file to have a 'main' function.

  • Is it possible to create an additional 'main' function inside UI.rs to run the CSV read code?

2.) If I can create the main in UI.rs. Would I then be able to simply execute:

 let logs = List::new(*returned_DataFrame*)...

OR Do I run it in the termion_demo.rs (main) file and make the vector available for reference to UI.rs similar to how the const LOGS is in app.rs?

I assume the latter is more sensible - Ideally though I'd like to do it all in one file as I'm not yet confident in piping data around.

Presumably you've gotten far enough to set up a Cargo.toml for your main.rs and Cargo.toml. I suggest you just copy the source code of the example to your own project (and maintain credit). It is possible to have multiple binaries.

To make sure this is viable, I copied the following files from their example directory into the src subdirectory of a new Cargo project:

  • util/ and all files therein
  • demo/ and all files therein
  • termion_demo.rs

Then added the following to Cargo.toml:

[dependencies]
argh = "0.1.4"
rand = "0.8.4"
termion = "1.5.6"
tui = "0.15.0"

[[bin]]
name = "termion_demo"
path = "src/termion_demo.rs"

I also removed one line from src/util/mod.rs:

-#[cfg(feature = "termion")]

After that it built and ran fine. Hopefully that gives you something to work with.

1 Like

Really appreciate that effort Quinedot, that cfg was difficult to track down as the root cause for not compiling.

I managed to pipe data in with varying degrees of success.

For anyone interested and/or struggling as I was (quality of code is questionable, but it works) - Piping data into LOGS:

Passing data to Logs Window
#[derive(Serialize, Deserialize, Clone)]
struct Test{
    id: usize,
    name: String,
    category: String,
    age: usize,
    created_at: String,
}

 fn read_db(path :&str) -> Result<Vec<Test>, Error> {
    let db_content = fs::read_to_string(path)?;
    let parsed: Vec<Test> = serde_json::from_str(&db_content)?;
    Ok(parsed)
}

....

let test_list= read_db(DB_PATH).expect("can fetch pet list");
            let items: Vec<_> = test_list
                .iter()
                .map(|test_item| {
                    ListItem::new(Spans::from(vec![Span::styled(
                        test_item.name.clone(),
                        Style::default(),
                    )]))
                })
                .collect();

let logs = List::new(items).block(Block::default().borders(Borders::ALL).title("List"));
f.render_stateful_widget(logs, chunks[1], &mut app.logs.state);

learnding


I have however hit a brick wall with trying to pipe data into a chart due to data structure requirements.

What I have scooped together so far ( I read data in from a CSV as it is easier to de-serialize into a tuple compared to JSON)

type Record = (f64, f64);
pub fn read_csv_tuple(path :&str) ->  Result<(), std::io::Error> {
    let mut rdr = csv::Reader::from_path(path)?;
    for result in rdr.deserialize() {
        // We must tell Serde what type we want to deserialize into.
        let record: Record = result?;
        println!("{:?}", record);
    }
    Ok(())
}
....

let testing = read_csv_tuple(DB_PATH_tup).expect("can fetch pet list");
let datasets = vec![
      Dataset::default()
             .name("data2")
             .marker(symbols::Marker::Dot)
             .style(Style::default().fg(Color::Cyan))
             .data(testing)] < ERROR originating from 'testing'

Issues with the above approach:
1.) While this prints an output that looks like what I need (x0,y0)(x1,y1), I feel that it is not 'collecting' the rows into a single (apologies for butchering terminology) object/dataframe/instance.
2.) I get the following error with the above code:

mismatched types
expected `&[(f64, f64)]`, found `()`
note: expected reference `&[(f64, f64)]`
         found unit type `()`rustc(E0308)
ui.rs(321, 23): expected `&[(f64, f64)]`, found `()`

expected &[(f64, f64)] How does it know to expect such a format? Is this something dictated in the library for this specific widget or is it user defined and I simply can't find the reference.

The data method is expecting a &[(f64, f64)], but you're feeding it a (). It expects such a type because that's it's function signature and Rust is strictly typed.

You're passing it a () because read_csv_tuple returns a Result<(), _>. So your testing variable is a ().

Try making read_csv_tuple return a Vec<(f64, f64)> (or equivalently a Vec<Record>). You may have to pass the data method &testing instead of testing as well.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.