Avoid cloning to work around returning reference to data owned by current function

I understand that when a function owns data you can't return references to it, because once exited the owned data is freed.

But what about when, to reduce typing, you just want a convenient function/macro to produced an owned value of some kind (that isn't needed for anything else) and produce a reference? Is there no way to accomplish this reduction in code without adding the expense of clone?

Here is a specific example that illustrates the desired outcome well, but I feel like this is a general Rust programming issue. In polars, I want a convenience function to get column names:

pub fn columns(lf: &mut LazyFrame) -> Result<Vec<PlSmallStr>, PolarsError> {
    let schema = lf.collect_schema()?;
    Ok(schema.iter_names().cloned().collect::<Vec<PlSmallStr>>())
}

The function of course works, but because schema is owned I must clone. Maybe not a big deal - I don't have millions of columns. But again, this is a general developer efficiency issue; I just want some kind of way to tell the compiler to write these lines of code for me.

For instance, if I don't use this function by instead copying/pasting everywhere, it's easy to scope schema such that it lives long enough. So I thought, maybe I write a macro so that schema appears in scope:

macro_rules! columns {
     ($lf:ident, $names:ident) => {{
         let schema = $lf.collect_schema()?;
         $names.extend(schema.iter_names());
     }};
 }
pub(crate) use columns;

But because I need let that means my macro must also scope it within a code block (same problem).

Is there no way to write a purely code-reducing convenience callable to get the schema and return the names so that I don't have to copy/paste these two lines everywhere?:


fn caller(lf: &mut LazyFrame) {
  // 1
  let schema = lf.collect_schema().unwrap();
  // 2: notice no clone since schema lives long enough to "do stuff" below
  let names = schema.iter_names().collect::<Vec<PlSmallStr>>()
  
  // would much rather just call columns(lf)... but not at the expense of cloning
  if names.contains("stuff".into()) {
    // do stuff...
  }
}

I would love a way to inline or macro those 2 lines as-is. Does nothing exist to serve this minimal code aesthetic?

You don't have to with that macro.

macro_rules! columns {
     ($lf:ident, $names:ident) => {
         let schema = $lf.collect_schema()?;
         $names.extend(schema.iter_names());
     };
}

But maybe you really meant this approach?

macro_rules! columns {
     ($lf:ident) => {{
         let schema = $lf.collect_schema()?;
         $lf.collect_schema()?.iter_names()
     }};
}

// ...
let names = columns!(lf);

That needs super let. (Or someone better at macros than me.)

Here's a tracking issue. Looks like there are some outstanding design issues (which I didn't dig into).

1 Like

thank you!

So weird that compiles... In my (larger) project unless I add the code block rust-analyzer says:

error: expected expression, found `let` statement
  --> src/col.rs:12:9
   |
12 |         let schema = $lf.collect_schema()?;
   |         ^^^
   |
  ::: src/col.rs:31:5
   |
31 |     columns!(lf, cols)?;
   |     -------------------- in this macro invocation
   |
   = note: only supported directly in conditions of `if` and `while` expressions
   = note: this error originates in the macro `columns` (in Nightly builds, run with -Z macro-backtrace for more info)

How can the compiler make this work without a code block?

Macros are allowed to expand to multiple statements, when used in a statement position. But if you have a ? after the macro call, then it has to be parsed as an expression.

1 Like

But if you have a ?

Thank you!

I guess it's worth mentioning that the error didn't emphasize enough the ? was the problem (to the untrained eye).

Totally makes sense though, as a single expression is needed for ?, not multiple.

You can also hide iter_names in a conversion to an iterator:

#![feature(impl_trait_in_assoc_type)]

use std::sync::Arc;
use polars_core::datatypes::DataType;
use polars_utils::pl_str::PlSmallStr;
use polars_error::PolarsError;
use polars_lazy::frame::LazyFrame;
use polars_schema::schema::Schema;

struct Columns(Arc<Schema<DataType>>);

impl<'a> IntoIterator for &'a Columns {
    type IntoIter = impl 'a + ExactSizeIterator<Item = Self::Item>;
    type Item = &'a PlSmallStr;
    fn into_iter(self) -> Self::IntoIter {
        self.0.iter_names()
    }
}

fn columns(lf: &mut LazyFrame) -> Result<Columns, PolarsError> {
    lf.collect_schema().map(Columns)
}

fn dummy_lazy_frame() -> LazyFrame {
    unimplemented!()
}

fn main() {
    let mut lazy_frame = dummy_lazy_frame();
    for _ in &columns(&mut lazy_frame).unwrap() {}
}
2 Likes

I can't see how this doesn't violate lifetimes... It's just beyond me but looks amazing!

Ah, I see it's lazy in the sense that you're returning the declaration for how to (eventually) consume into a vec, rather than materializing a vec.

Very cool, but still I need to collect that vec - so just kicking the can down the road a bit.