Help With Serde....Lost

I can't find anywhere on the web the solution to the problem, deserialize a csv column to a Vec<&str> using serde. I'm new to applying traits with lifetimes to structs, and have read the lifetimes section of serde crate. This is a simple thing, don't want to overdo it. Please help.

This is possible only if you keep the original CSV around as String or something like that, first of all. Then, do I understand correctly that you already have some code with the desired logic, but it doesn't compile?

This is the closest i have:

struct DataFrameRaw<T> {
    dates: Vec<T>,
    closes: Vec<T>,    
}

#[derive(serde::Deserialize)]
struct Data<T> {
    value: T,
}

impl<'de, T> serde::Deserialize<'de> for DataFrameRaw<T>
where
    T: serde::Deserialize<'de>,
{
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: serde::Deserializer<'de>,
    {
        let v: Vec<Data<T>> = serde::Deserialize::deserialize(deserializer)?;
        let values = v.into_iter().map(|f| f.value).collect();

        Ok(DataFrameRaw { values })
    }
}

So What you are saying is that it is not possible, even using serde's 'de lifetime to magically deserialize to a vec of &str without keeping the String, meaning you might as well from a memory and performance standpoint just go with the String?

Where are the actual string characters stored? Rust references are pointers with provenance; they aren't storage for the pointed-to things. Rust's lifetimes provide a way to ensure that references don't live longer than the storage to which they refer.

value format in .csv = "2020-11-16T00:00:00.029158276-05"

i would like the 500,000 or so of those entries to be stored in a Vec<&str> rather than a Vec of String, but if it is either not possible, or not advantageous memory wise, i suppose i shouldn't.

i understand the concept of having a reference to something point to the memory of the owned version, i just maybe was confused on what lifetimes could provide here, because i don't understand those very well yet.

Lifetimes are used by the rustc compiler in its attempt to prove that reference chains do not lead to memory-use errors such as data races, use-after-free, double-free, etc. Lifetimes exist only at compile-time; they have no impact on run-time code. Rust does not have garbage collection.

A Vec<&str> is basically a resizable array of fat pointers. Where in the program's memory are the character strings to which those pointers point? What is keeping those strings in memory? It's not Rust's garbage collector, because Rust does not use garbage collection. You are responsible for keeping those strings in memory, either by keeping the csv input file in memory and pointing to the starting byte locations of those strings in that file (together with an associated byte length for each string), or by allocating heap storage for the Strings as you process them, retaining those Strings until the rest of your code no longer needs them.

Awesome. Not being able to do it solves a huge problem. Thanks man! much appreciated..

It is technically possible to borrow from the deserializer with serde, if the deserializer supports it. Like this:

#[derive(Deserialize)]
struct MyStruct<'a> {
    col_1: &'a str,
    col_2: &'a str,
}

However, csv::Reader::into_deserialize requires your record type to be serde::de::DeserializeOwned which, in summary, means that the deserializer doesn't support it.

You should be able to just use owned strings, and it'll be a lot simpler.

#[derive(Deserialize)]
struct MyStruct {
    col_1: String,
    col_2: String,
}

Then to capture a vector of column-1 values, you might do something along the lines of:

csv::Reader::from_reader(reader)
    .into_deserialize::<MyStruct>()
    .map(|result| result.map(|record| record.col_1))
    .collect::<Result<Vec<String>, _>>()?

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.