Passing a generated function to a macro derive attribute from the logos crate

Hello rustaceans!
I've just tried integrating the logos crate in my latest project, and I'm having a minor issue.
Here's a part of my code:

#[derive(PartialEq, Debug, Logos)]
#[logos(error = ParseError)]
pub(crate) enum MaskAtom {
    #[regex(r"@\d{1,2}", octave)]
    Octave(NonZeroU8),
    #[regex(r"\$\d{1,3}", (normal_cmd_callback_generator("length")))]
    Length(u8),
    #[regex(r"!\d{1,3}", (normal_cmd_callback_generator("volume")))]
    Volume(u8),
    #[token(".")]
    #[regex(r"[ \t\n\f\r]+", junk)]
    Rest,
}

fn normal_cmd_callback_generator(
    for_: &str,
) -> impl Fn(&mut Lexer<MaskAtom>) -> Result<u8, ParseError> + '_ {
    move |lex| {
        lex.slice()[1..].parse().map_err(|_| ParseError {
            msg: format!("Expected a {} number", for_),
        })
    }
}

I'm using a function to generate another function to call when a length or volume pattern is found. However, I've found that I need to put the generator call between parenthesis for this code to compile (because of macro magic syntax I suppose) and this makes the callback function get generated every time a pattern is found. I can't make a scope where I generate the function and then move it in a closure and return it.

I also cannot make a playground out of it because the logos crate isn't supported but this snippet compiles fine in a cargo project with logos as the single dep:

use logos::{Lexer, Logos, Skip};
use std::fmt::Debug;
use std::num::NonZeroU8;

#[derive(Clone, PartialEq)]
pub(crate) struct ParseError {
    msg: String,
}

impl Debug for ParseError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "{} just before {:?}", self.msg, (0, 0))
    }
}

impl Default for ParseError {
    fn default() -> Self {
        ParseError {
            msg: "Dummy error".to_string(),
        }
    }
}

#[derive(PartialEq, Debug, Logos)]
#[logos(error = ParseError)]
pub(crate) enum MaskAtom {
    #[regex(r"@\d{1,2}", octave)]
    Octave(NonZeroU8),
    #[regex(r"\$\d{1,3}", (normal_cmd_callback_generator("length")))]
    Length(u8),
    #[regex(r"!\d{1,3}", (normal_cmd_callback_generator("volume")))]
    Volume(u8),
    #[token(".")]
    #[regex(r"[ \t\n\f\r]+", junk)]
    Rest,
}

fn normal_cmd_callback_generator(
    for_: &str,
) -> impl Fn(&mut Lexer<MaskAtom>) -> Result<u8, ParseError> + '_ {
    println!("generator was called!");
    move |lex| {
        lex.slice()[1..].parse().map_err(|_| ParseError {
            msg: format!("Expected a {} number", for_),
        })
    }
}

fn junk(lex: &mut Lexer<MaskAtom>) -> Skip {
    Skip
}

fn octave(lex: &mut Lexer<MaskAtom>) -> Result<NonZeroU8, ParseError> {
    NonZeroU8::new(lex.slice()[1..].parse().map_err(|_| ParseError {
        msg: "Expected an octave number".to_string(),
    })?)
    .ok_or(ParseError {
        msg: "Octave 0 does not exist".to_string(),
    })
}

fn main() {
    MaskAtom::lexer("$15!45$1!123").for_each(|r| println!("Found: {:?}", r.unwrap()));
}

The entire source file is on GitHub.

So, can I give to this logos macro derive a static but generated closure? Is this even a real (performance) issue?

If you remove the println!("generator was called!");, then normal_cmd_callback_generator is essentially a no-op. It turns the &str into a closure that captures the &str. Both will have the same representation in memory. A closure is simply an anonymous struct consisting of all variable captures as fields.

I.e. there should be no performance issue.


For illustration, using https://crates.io/crates/cargo-show-asm and modifying your snipped so that the println is gone and the function isn’t inlined

#[inline(never)]
fn normal_cmd_callback_generator(
    for_: &str,
) -> impl Fn(&mut Lexer<MaskAtom>) -> Result<u8, ParseError> + '_ {
    // println!("generator was called!");
    move |lex| {
        lex.slice()[1..].parse().map_err(|_| ParseError {
            msg: format!("Expected a {} number", for_),
        })
    }
}

I get

> cargo asm --simplify playground::normal_cmd_callback_generator
warning: unused variable: `lex`
  --> src/main.rs:50:9
   |
50 | fn junk(lex: &mut Lexer<MaskAtom>) -> Skip {
   |         ^^^ help: if this is intentional, prefix it with an underscore: `_lex`
   |
   = note: `#[warn(unused_variables)]` on by default

warning: `playground` (bin "playground") generated 1 warning (run `cargo fix --bin "playground"` to apply 1 suggestion)
    Finished release [optimized] target(s) in 0.00s

playground::normal_cmd_callback_generator:

        mov rax, rdi

        ret

so yeah… as expected literally a no-op, just returning the function argument, presumably passed in in rdi and returned in rax.

(Without inlining turned off, of course then it would competely disappear at the call-site.)

2 Likes

Thank you very much, I didn't know this was possible!
So Rust really is that smart...

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.