Issue with crate Syn and `Parse` trait in functional macro

dbsxdbsx · April 5, 2024, 12:09am

I am trying to write a functional macro that part of its feature is
to parse code matching in such style: pub(crate) var:var_type; var2:var2_type;... in a trait, there could be as many vars as possible, and no need to be at the same line.

the core code is like this:

/// Define the enum to represent different kinds of trait variable types.
/*↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓*/
struct TraitVarType {
    name: TokenStream, // the whole type name, including generics, like `HashMap<K, V>`, `i32`, `T`, etc.
    generics: Vec<String>, // the generic type elements in the trait type, like `K, V` in `HashMap<K, V>`
}
impl ToTokens for TraitVarType {
    fn to_tokens(&self, tokens: &mut TokenStream) {
        tokens.extend(self.name.clone());
    }
}
impl Parse for TraitVarType {
    fn parse(input: ParseStream) -> syn::Result<Self> {
        let mut name = TokenStream::new();
        let mut generics = Vec::new();

        // 1. Parse until a semicolon is found, indicating the end of a type definition
        while !input.is_empty() {
            if input.peek(Token![;]) {
                println!("the EXIT token is: `{}`", input.parse::<TokenTree>()?);
                break;
            }
            let token = input.parse::<TokenTree>()?;
            println!("the token is:{}", token.to_string());
            name.extend(Some(token));
        }

        println!("finally， the name is:{}", name.to_string());

        // 2. Parse generics
        if let Ok(type_parsed) = syn::parse2::<Type>(name.clone().into()) {
            let mut visitor = GenericTypeVisitor {
                generics: Vec::new(),
            };
            visitor.visit_type(&type_parsed);
            generics.extend(visitor.generics);
        }

        // 3. Return
        Ok(TraitVarType { name, generics })
    }
}

#[test]
fn test_trait_var_type() {
    let raw_code = quote! { Vec<T, HashMap<K, V>>; x}; // `x` is put on purpose
    println!("the raw code is:`{}`", raw_code);
    let parsed = parse2::<TraitVarType>(raw_code.clone()).expect("Failed to parse");
    println!("the raw code is:`{}`", raw_code);

    assert_eq!(
        parsed.name.to_string(),
        "Vec < T , HashMap < K , V >>".to_string()
    );
    assert_eq!(
        parsed.generics,
        vec!["T".to_string(), "K".to_string(), "V".to_string()]
    );
}
/*↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑*/

/// Define the struct to represent a single trait variable field.
struct TraitVarField {
    var_vis: Visibility,
    var_name: Ident,
    type_name: TraitVarType,
}
impl Parse for TraitVarField {
    fn parse(input: ParseStream) -> syn::Result<Self> {
        println!("the orig input is:{}", input);
        let var_vis: Visibility = input.parse().expect("Failed to Parse to `var_vis`");
        println!("the input is:{}", input);
        let var_name: Ident = input.parse().expect("Failed to Parse to `var_name`");
        println!("the input is:{}", input);
        let _: Token![:] = input.parse().expect("Failed to Parse to `:`");
        println!("the input before TraitVarType is:{}", input);
        let type_name: TraitVarType = input.parse().expect("Failed to Parse to `type_name`");
        println!("the input after TraitVarType is:{}", input);
        let _: Token![;] = input.parse().expect("Failed to Parse to `;`");

        Ok(TraitVarField {
            var_vis,
            var_name,
            type_name,
        })
    }
}

The struct TraitVarField is for handling each var definition, and TraitVarType
is specific for parsing attributes of the type.

The core issue is related to TraitVarType when running with test_trait_var_type test, it panics with msg thread 'test_trait_var_type' panicked at src\lib.rs:111:59: Failed to parse: Error("unexpected token"),
which is caused by line let parsed = parse2::<TraitVarType>(raw_code.clone()).expect("Failed to parse"); ---simply speaking, the issue is due to it hits x after ;, which is not expected. But practially,
there wound be tokens that should NOT be parsed in this parse() invocation (just like what the obj of TraitVarType is used in impl Parse for TraitVarField block).

So, what should I do to fix this issue with Syn V2.0?

(Ps: for simplicity, I didn't put the GenericTypeVisitor code here,
since it's not the root cause of the issue. I've tested it with no error)

vague · April 5, 2024, 1:46am

It's annoying to see very incomplete code like this:

lack of imports
broken with non-existent methods

For OP, the problem is syn is cursor-aware in parsing: when tokens are fully parsed, the cursor should reach the end.

But the code is very strange

why put an ident as a part of input? To make your code work, you should parse it via let _ = input.parse::<Ident>().expect("Failed to Parse x"); or advancing the cursor in any way.

let raw_code = quote! { Vec<T, HashMap<K, V>>; x}; // `x` is put on purpose

double parsing is suspicious too

let token = input.parse::<TokenTree>()?;
name.extend(Some(token));

if let Ok(type_parsed) = syn::parse2::<Type>(name.clone().into()) {

dbsxdbsx · April 5, 2024, 4:45am

Thanks for your reply. Maybe I just compact it too much, the full code is here with syn ={version="^2.0", features=["visit", "full", "parsing"]}.

why put an ident as a part of input

If the ident you referred to is x, I would say that it is added to mimic a real case, in which there would be many tokens that should be parsed but should not be parsed in a single parse call---In other words, this code

 let raw_code = quote! { Vec<T, HashMap<K, V>>; x}; // `x` is put on purpose
 let parsed = parse2::<TraitVarType>(raw_code.clone()).expect("Failed to parse");

is to mimic the similar usage as let type_name: TraitVarType = input.parse().expect("Failed to Parse totype_name"); in block:

impl Parse for TraitVarField {
    fn parse(input: ParseStream) -> syn::Result<Self> {
...
 let type_name: TraitVarType = input.parse().expect("Failed to Parse to `type_name`");
...
    }
}

My sole aim is that whatever how many this pattern statements, like:

 var1:Vec<T>;
var2:HashMap<K, Vec<V>>;
...

just parse them by TraitVarField (each instance of TraitVarField represents for only 1 this kind of statement). Specifically, for the above case, there should be 2 instances of TraitVarField to take care of it,and field name of field type_name of the 2ed TraitVarField instance should be HashMap<K, Vec<V>>---even it is an invalid type, just hold it as I want(that is why I take TokenStream but not Syn::Type as the type for name field of struct TraitVarType).

This issue stuck me for several days. And maybe I just put the whole code in a wrong design? Please don't hesitate tell me anything if I am doing it in the wrong way.

vague · April 5, 2024, 6:08am

For this snippet, it's really rare to manually write a repeat pattern like this.

        // 1. Parse until a semicolon is found, indicating the end of a type definition
        while !input.is_empty() {
            if input.peek(Token![;]) {
                println!("the EXIT token is: {}", input.parse::<TokenTree>()?);
                break;
            }
            let token = input.parse::<TokenTree>()?;
            println!("the token is:{}", token.to_string());
            name.extend(Some(token));
        }

        println!("finally， the name is:{}", name.to_string());

        // 2. Parse generics
        if let Ok(type_parsed) = syn::parse2::<Type>(name.clone().into()) {

You seem to be aware of Punctuated, but intend to delete it.

Punctuated is the right choice here: Rust Playground

fn main() {
    // omit tokens unrelated to your question
    let raw_code = quote! { Vec<T, HashMap<K, V>>; Vec<T>;}; 
    // get a list of parsed Type, separated by ;
    let fields: Punctuated<Type, Token![;]> =
        Punctuated::parse_terminated.parse2(raw_code).unwrap(); 
    dbg!(&fields); // extract generic params here
}

The pattern is so common that there is even ParseBuffer in syn::parse - Rust for it.

fn parse(input: ParseStream) {
    let fields = input.parse_terminated(Type::parse, Token![;]);
}

dbsxdbsx · April 5, 2024, 2:09pm

Thanks for your inspiration. I did solve it with the latest version with field trait_variables of type Punctuated<Type, Token![;]> but not using parse_terminated as before, specifically:

  trait_variables: {
                let mut vars = Punctuated::new();
                while !content.peek(Token![type])
                    && !content.peek(Token![const])
                    && !content.peek(Token![fn])
                    && !content.is_empty()
                {
                    vars.push_value(content.parse()?);
                    if content.peek(Token![;]) {
                        vars.push_punct(content.parse()?);
                    }
                }
                vars
                // TODO: delete
                // content
                //     .parse_terminated(TraitVarField::parse, Token![;])
                //     .unwrap_or_default()
            },

...
#[test]
fn test_trait_input() {
    let raw_code = quote! {
        pub trait MyTrait {
            x: Vec<T, HashMap<K, V>>;
            pub y: bool;

            fn print_x(&self){
                println!("x: `{}`", self.x);
            }
            fn print_y(&self){
                println!("y: `{}`", self.y);
            }
            fn print_all(&self);
        }
    };
    let parsed = parse2::<TraitInput>(raw_code).unwrap();
...
}

for this, if still taking parse_terminated, then the corresponding unit test fn test_trait_input() would panic, because the fn part code(the normal trait items) below the trait vars would be wrongly treated as part of trait_variables--- that is why I didn't use the Punctuated style before.

(The corresponding integrated test related to use complex trait var types is here: complex.rs)

vague · April 5, 2024, 4:15pm

If fields and fns can be separated apart, it'd be easier to parse: Rust Playground

use syn::{punctuated::Punctuated, *};
fn main() {
    let parsed: Parsed = parse_quote! {
        pub trait MyTrait {
            x: Vec<T, HashMap<K, V>>;
            pub y: bool;
        }{ // separate them in a way
            fn print_x(&self){
                println!("x: `{}`", self.x);
            }
            fn print_y(&self){
                println!("y: `{}`", self.y);
            }
            fn print_all(&self);
        }
    };
    // dbg!(&parsed);
}

#[derive(Debug)]
struct Parsed {
    vis: Visibility,
    trait_: Token![trait],
    ident: Ident,
    brace: token::Brace,
    fields: Punctuated<Field, Token![;]>,
    brace2: token::Brace,
    fns: Punctuated<TraitItemFn, parse::Nothing>,
}

impl parse::Parse for Parsed {
    fn parse(input: parse::ParseStream) -> Result<Self> {
        let mut content;
        Ok(Self {
            vis: input.parse()?,
            trait_: input.parse()?,
            ident: input.parse()?,
            brace: braced!(content in input),
            fields: Punctuated::parse_terminated_with(&content, Field::parse_named)?,
            brace2: braced!(content in input),
            fns: Punctuated::parse_terminated(&content)?,
        })
    }
}

But if they are required to be mixed, restrictions should be considered in case of advancing tokens beyond needed. Rust Playground

impl parse::Parse for Parsed {
    fn parse(input: parse::ParseStream) -> Result<Self> {
        let content;
        Ok(Self {
            vis: input.parse()?,
            trait_: input.parse()?,
            ident: input.parse()?,
            brace: braced!(content in input),
            fields: {
                let mut v = Vec::new();
                while !content.peek(Token![fn]) { // peek to aviod moving the cursor to `fn` token here
                    v.push(content.call(Field::parse_named)?);
                    let _: Token![;] = content.parse()?;
                }
                v
            },
            fns: Punctuated::parse_terminated(&content)?,
        })
    }
}

dbsxdbsx · April 8, 2024, 2:27am

Thanks, nice try! After testing, this approach is also applicable in my project, and finally I integrated both approaches into this:

struct TraitInput {
    trait_vis: Visibility,
    _trait_token: Token![trait],
    trait_name: Ident,
    trait_bounds: Option<Generics>, // optional generic parameters for the trait
    explicit_parent_traits: Option<Punctuated<TypeParamBound, Token![+]>>, // explicit parent traits
    where_clause: Option<WhereClause>, // optional where clause for the trait
    _brace_token: token::Brace,
    trait_variables: Vec<TraitVarField>,
    trait_items: Vec<TraitItem>,
}

impl Parse for TraitInput {
    fn parse(input: ParseStream) -> syn::Result<Self> {
        let content;

        Ok(TraitInput {
            trait_vis: input.parse()?,
            _trait_token: input.parse()?,
            trait_name: input.parse()?,
            trait_bounds: if input.peek(Token![<]) {
                Some(input.parse()?) // Use the parse method to parse the generics
            } else {
                None
            },
            explicit_parent_traits: if input.peek(Token![:]) {
                input.parse::<Token![:]>()?;
                let mut parent_traits = Punctuated::new();
                while !input.peek(Token![where]) && !input.peek(token::Brace) {
                    parent_traits.push_value(input.parse()?);
                    if input.peek(Token![+]) {
                        parent_traits.push_punct(input.parse()?);
                    } else {
                        break;
                    }
                }
                Some(parent_traits)
            } else {
                None
            },
            where_clause: if input.peek(syn::token::Where) {
                Some(input.parse()?)
            } else {
                None
            },
            _brace_token: braced!(content in input),
            // Parse all variable declarations until a method or end of input is encountered
            trait_variables: {
                let mut v = Vec::new();
                while !content.peek(Token![type])
                    && !content.peek(Token![const])
                    && !content.peek(Token![fn])
                    && !content.is_empty()
                {
                    v.push(content.call(TraitVarField::parse)?);
                    let _: Token![;] = content.parse()?;
                }
                v
            },
            trait_items: {
                let mut items = Vec::new();
                while !content.is_empty() {
                    items.push(content.parse()?);
                }
                items
            },
        })
    }
}

system · July 7, 2024, 2:28am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Help with proc macro: unexpected token, expected `]` help	5	303	February 14, 2025
Syn 0.15 -- improved parsing API that can trigger errors in the right place on invalid macro input announcements	1	942	January 12, 2023
Parse field from quote in derive macro help	7	1989	July 2, 2020
How to parse Vec<T> type variable inside of Struct help	10	1324	June 15, 2021
How to use syn to parse punctuated types? help	9	4552	August 10, 2022

Issue with crate Syn and `Parse` trait in functional macro

Related topics