Cloning iterators to look ahead in token streams

Looking at this approach to looking ahead in a token stream, which boils down to

use proc_macro2::token_stream::IntoIter;

fn foo(rest: &mut IntoIter) {
    let mut peek = rest.clone();
    match (peek.next(), peek.next())  { ... }
}

which, with explicit reborrowing and type annotations becomes (IIANM)

use proc_macro2::token_stream::IntoIter;

fn foo(rest: &mut IntoIter) {
    let mut peek: IntoIter = (&*rest).clone();
    match ((&mut peek).next(), (&mut peek).next())  { ... }
}

I trust that this does not clone the whole token stream, rather merely something that is effectively a cursor. But the details of how this is done are not clear to me. I guess I don't understand how lazy traversal and/or holding in memory of the token stream itself is implemented in the first place.

Can you offer any insight?

proc_macro2::token_stream::IntoIter is basically std::vec::IntoIter<TokenTree>.
Because std::vec::IntoIter is a cursor, i.e. a raw pointer without storing real data, cloning it is cheap without cloning the whole data in the heap.

Note proc_macro2::TokenStream is a Rc<Vec<TokenTree>>, so cloning it is cheap, but turning it into proc_macro2::token_stream::IntoIter is not cheap:

  • if there is single TokenStream, Rc::get_mut is used, so getting proc_macro2::token_stream::IntoIter is free
  • if not, Vec::clone is used to get IntoIter, so the whole tokens are cloned

Recap:

  • it's cheap to clone TokenStream and its IntoIter
  • it's not cheap to clone IntoIter
  • it's not cheap to turn TokenStream into IntoIter when multiple TokenStreams exist

That's incorrect – IntoIter is owning, so it must (and does) clone the underlying data, since two clones can't give away ownership of the same values twice.

1 Like

Sorry, I didn't check that out :frowning:

So it's not cheap to clone IntoIter.

I was wondering why there isn't a Iter version on proc_macro2::TokenStream.
It's cheap to clone TokenStream, and the real parsing must deal with TokenTree, but the only way to get TokenTree from TokenStream is an owned IntoIter which might be expensive to obtain.

I can only tell to get a cursor of TokenStream, ParseBuffer in syn::parse - Rust should be used.

Good: that's what I was trying to use, but then I got distracted by coming across the approach linked in my OP.

So this approach works in a toy example, but should not be used in production (and my trust was misplaced)?

Cloning unparsed tokens (&mut IntoIter) is fine because it's not the bottleneck in most cases.

Even for GitHub - PoignardAzur/venial: "A very small syn" , it uses cloning too ( Rollback iterator to avoid cloning tokens · Issue #41 · PoignardAzur/venial · GitHub ).

Well, for clarity, I mean it's fine to clone tokens in seq-proc like case where you don't do heavy parsing.

For ergonomic tokens parsing, implement Parse in syn::parse - Rust for your custom data structure instead.

Or use my library, parsel for deriving an implementation in the non-weird cases.

2 Likes

It hits me that multipeek method in itertools meets your need. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=5aa1a47de5b1e5daeb9206e182cbbb3d

1 Like

I like the high level encapsulations on quote and syn in your crate

  • seamlessly interacts with syn types/macros
  • derive Parse and ToTokens on custom types instead of writing apparent but verbose input.parse() in manual Parse impl
  • handy types, like
type functionality
Either generic dichotomous alternation
Maybe optional expression introduced by some lookahead tokens
Separated Punctuated but trailing punctuation is not allowed

I try using it as my async_closure! decl macro, and indeed clean:

#![feature(trivial_bounds)]
fn main() {
    let _: AsyncClosure = parse_quote!(async || -> usize { f().await });
    let _: AsyncClosure = parse_quote!({}; async || -> usize { f().await });
    let _: AsyncClosure = parse_quote!({
        v: &'a [u8] = &v,
    }; async || -> usize { f(v).await });
}

#[derive(PartialEq, Eq, Debug, Parse, ToTokens)]
struct AsyncClosure {
    captures: Maybe<Captures>,
    _async: Async,
    closure: ExprClosure,
}
#[derive(PartialEq, Eq, Debug, Parse, ToTokens)]
struct Capture {
    val: Ident,
    _colon: Colon,
    ty: Type,
    _eq: Eq,
    expr: Expr,
    _comma: Comma,
}
#[derive(PartialEq, Eq, Debug, Parse, ToTokens)]
struct Captures {
    captures: Brace<Punctuated<Capture, Comma>>,
    semi: Semi,
}

use parsel::{
    ast::{Brace, Ident, Maybe, Punctuated},
    parse_quote,
    syn::{
        token::{Async, Colon, Comma, Eq, Semi},
        Expr, ExprClosure, Type,
    },
    Parse, ToTokens,
};

The important insight which everyone forgot to mention is that proc macros are fast. Even if you wouldn't try to optimize them it's easy to generate enough code in a second to make compiler busy for a few minutes.

Thus I wouldn't spend too much time fretting about efficiency of code emitter and would concentrate of how efficient would it be for the compiler to process the generated code instead.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.