Closures and references

I have a generic function from_tokens with two functions (from_tokens1 and from_tokens2) providing specific closures.

Now I have two questions regarding the code below:

  1. How can the closure in from_tokens1 be implemented?
  2. How can the pattern match in from_tokens2 be avoided?
use itertools::Itertools;
use std::collections::HashMap;

enum Token {
    Tpe1(String),
    Tpe2(String),
}

impl Token {
    pub fn as_str(&self) -> &str {
        match self {
            Token::Tpe1(s) => s.as_str(),
            Token::Tpe2(s) => s.as_str(),
        }
    }
}

struct Doc<'a> {
    tokens1: Vec<Token>,
    tokens2: Option<Vec<&'a str>>,
}

#[derive(Debug)]
pub struct DfTable<'a> {
    num_documents: u32,
    inner: HashMap<&'a str, u32>,
}

impl DfTable<'_> {
    fn from_tokens<'a>(docs: &[&'a Doc], get_tokens: fn(&'a Doc) -> &'a [&'a str]) -> DfTable<'a> {
        let mut df: HashMap<&str, u32> = HashMap::new();

        for doc in docs {
            for token in get_tokens(doc).iter().unique() {
                df.entry(&token)
                    .and_modify(|x| *x += 1)
                    .or_insert(1);
            }
        }

        DfTable {
            num_documents: docs.len() as u32,
            inner: df,
        }
    }

    fn from_tokens1<'a>(docs: &[&'a Doc]) -> DfTable<'a> {
        Self::from_tokens(docs, |doc| {
            // TODO something like this
            // &doc.tokens1.iter().map(|x| x.as_str()).collect_vec()
            todo!()
        })
    }

    fn from_tokens2<'a>(docs: &[&'a Doc]) -> DfTable<'a> {
        Self::from_tokens(docs, |doc| {
            // TODO how to avoid pattern match?
            match &doc.tokens2 {
                Some(xs) => &xs,
                None => &[],
            }
        })
    }
}

fn main() {
    let doc1 = Doc {
        tokens1: vec![Token::Tpe1("foo".to_owned()), Token::Tpe2("bar".to_owned())],
        tokens2: Some(vec!["baz"]),
    };
    let doc2 = Doc {
        tokens1: vec![
            Token::Tpe1("foo".to_owned()),
            Token::Tpe2("quux".to_owned()),
        ],
        tokens2: Some(vec!["fnord"]),
    };

    let docs = vec![&doc1, &doc2];

    let df1 = DfTable::from_tokens1(&docs);
    let df2 = DfTable::from_tokens2(&docs);

    println!("{df1:#?}");
    println!("{df2:#?}");
}

Playground

With the current signature, it cannot. You would need to either provide from_tokens1 a mutable reference to the vector of tokens, or change the signature to use an owned vec rather than a slice.

doc.tokens2.as_ref().map(|xs| xs.as_slice()).unwrap_or(&[])

You can change the signature of from_tokens:

    fn from_tokens<'a, I, F>(docs: &[&'a Doc<'_>], mut get_tokens: F) -> DfTable<'a>
    where
        F: FnMut(&'a Doc<'_>) -> I,
        I: IntoIterator<Item = &'a str>,
    {

And then generate iterators instead of slices:

    fn from_tokens1<'a>(docs: &[&'a Doc<'_>]) -> DfTable<'a> {
        Self::from_tokens(docs, |doc| {
            doc.tokens1.iter().map(|x| x.as_str())
        })
    }

    fn from_tokens2<'a>(docs: &[&'a Doc<'_>]) -> DfTable<'a> {
        Self::from_tokens(docs, |doc| {
            doc.tokens2.as_ref().into_iter().flatten().copied()
        })
    }
2 Likes

Thanks for showing the iterator-based approach!

I guess there is no way which does not require copying the elements of the vector?

fn from_tokens2<'a>(docs: &[&'a Doc<'_>]) -> DfTable<'a> {
    Self::from_tokens(docs, |doc| {
        doc.tokens2.as_ref().into_iter().flatten().map(|x| *x)
    })
}
1 Like

Sorry, my bad. If I see correctly, then copied on a Vec<&&str> just dereferences the &&str to &str, so there is no actual copying of the string data involved.

fn main() {

    let xs = vec!["hello", "str"];

    let ys = xs.iter().copied().collect::<Vec<_>>();

    println!("[{x1:p}, {x2:p}]", x1 = xs[0], x2 = xs[1]); // [0x5c1476d41395, 0x5c1476d4137f]
    println!("[{y1:p}, {y2:p}]", y1 = ys[0], y2 = ys[1]); // [0x5c1476d41395, 0x5c1476d4137f]

}

Yup, indeed.

Also note that generating Strings and trying to creates &strs to them to store in the DfTable would run into ownership problems -- who owns the Strings if they weren't stored anywhere?

Which is also why, in a general sense, you couldn't pass a &[&str] in the OP -- a slice of those [&str] didn't exist and you had no way to store a Vec<&str> you just created.

1 Like