In polars, how to apply a custon function to a column of strings?

Hi,

After looking at polars documentation, I'm having a really hard time to apply a custom function (taking a string, doing operations not available in polars and returning a new string) and get the result as a Vect.
There is the possibility of UDF but it looks quite heavy to code.

My best attempt so far


    let df = CsvReader::from_path("data.csv")
        .unwrap()
        .infer_schema(None)
        .has_header(true)
        .finish()
        .unwrap();
    let out = df
        .clone()
        .lazy()
        .select([col("MyColumn")])
        .collect()
        .unwrap();
    let clinical = &out.get_columns()[0];
    let res: Vec<_> = clinical
        .str()
        .unwrap()
        .into_iter()
        .map(|e| myfun(e))
        .collect();

As a beginner, this is super-heavy code. Is there a better way to do that ? Thanks !

Disclaimer: I have never used polars.

There's a column method in the DataFrame struct that returns a Result<&Series, _>: DataFrame in polars::frame - Rust.

Series implements different methods that return chunked arrays, which in turn implement the IntoIterator trait.

All this means that you could do something like this (not tested):

let df = CsvReader::from_path("data.csv")
        .unwrap()
        .infer_schema(None)
        .has_header(true)
        .finish()
        .unwrap();

let res: Vec<_> = df.column("MyColumn").unwrap().str().unwrap().into_iter().map(|e| myfun(e)).collect();

Hope that helps.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.