Why is there no whitespace between the punctuator `$` and the identifier after transforming by the macro?

I wrote a procedure macro whose function only transforms the raw tokens into a string, which looks something like this:

#[proc_macro]
pub fn token_to_str(v:TokenStream)->TokenStream{
   let s = v.to_string();
   let s = format!(r##""{s}""##);
   return s.parse().unwrap()
}

Then use the macro with something like this:

fn main(){
   token_to_str!{
     struct #name {
       i:i32
    }
   }
}

The output is struct # Name { i : i32 }, and there is each a whitespace character between struct, #, and Name, if change # to other punctuators, which have the same result. The exception is if change the punctuator to $, there is no whitespace character between the punctuator and Name, and the output is struct $Name { i : i32 }. Moreover, if change the punctuator to !, then the result will be different from the two formers, the output is struct! Name { i : i32 }, that is, there is no whitespace character between struct and ! but there is one between ! and Name.

For punctuators $, $, #(including other punctuators that are not the previous two), these outputs are not consistent, which is weird. How are these tokens separated and what rules does the separation according to?

They don't need to be "consistent" because they are for completely different purposes:

  • $ followed by an identifier introduces a fragment designator in a dclarative macro
  • an identifier followed by ! is a function-like macro invocation
  • # followed by an identifier has no built-in syntactic meaning.

The tokens are simply printed in a way that would be considered idiomatic in their most common usage.

4 Likes

In what sense do you expect them to be ā€œseparatedā€? What does that would even mean for them to be ā€œseparatedā€?

As you can see from the signature of your functions macros receive sequence of tokens and then produce sequence of tokens.

That fact that you may convert that sequence into string is just for the documentation purposes.

Whitespace has no meaning inside of macros, and thus isn't preserved in any way. The TokenStream::to_string method just prints the tokens in whichever way it considers useful, but it shouldn't be considered as a stable behaviour in any way, and doesn't affect subsequent parsing of that string as a TokenStream.

1 Like

ToString and Display are required to be stable (at least for parseable types).

Really? Where's that documented? The definition of Display or FromStr certainly don't guarantee that. Nor do they guarantee to be related in any way.

We have been through this many times. It's a well-established principle. See eg. here. It's the whole raison d'ĆŖtre of the separation between Debug and Display.

I asked for documentation, and instead you've just linked your old assertion, not based in fact. Linking it several times doesn't make it true.

The doc part from the standard library doesn't support your claim either, and in fact contradicts common established practice. "fmt::Display implementations assert that the type can be faithfully represented as a UTF-8 string at all times." --- I don't know what "faithfully" is supposed to mean here, but it's certainly not "can be parsed back into the original representation". For example, errors require to implement Debug as part of the Error trait contract, but they cannot generally be parsed. They just provide some nice user-facing message.

Your assertion would mean "Display impls must provide text-based serialization", which is neither true nor desirable.

Did you even read the thread? It's extremely tiring being forced to point out things that you could have just, you know, read for yourself by following the link.

It's not "my assertion", there are several other reasons:

  1. since you asked for documentation: the very existence of Debug and the fact that it explicitly says in the documentation that it's unstable, while there's no such parallel claim for Display

  2. the very sentence you cited as a counterexample:

    But errors aren't required to be FromStr, so they can't be parsed, so this point is moot. I also never claimed that Debug must be stable, so stop putting words in my mouth already. I am talking about Display, not Debug. When a type is FromStr, it is expected that its Display is parseable.

  3. the fact that the stdlib itself implements Display in this manner and literally everyone depends on i32::fmt() not suddenly returning a representation in full Portugese words spelled with Chinese characters.

Well… about that: Does this mean we should raise an issue on this API (admittedly, still unstable)? Should it be changed, or document a warning that this specific Display implemenation is, technically, unstable [1], or should this be accepted as-is because it’s an ā€œerror messageā€ semantically, anyways?

Also what about format_args!({:?}, x) and fmt::Arguments, that’s literally a type implementing Display, but implemented using x’s Debug implementation. Also a special case? Maybe we’re just lacking a lot of documentation on this… it clearly cannot be ā€œall Display impls are stableā€.

I would assume that third-party crates also contain a lot of similar (and similarly undocumented) cases, especially around Display implementations of error values.

:thinking:

Hmm… Maybe I’m thinking of these examples wrong, and I should think of the time of constructing the OccupiedError or the fmt::Arguments as (analogous to) already invoking the Debug implementation, comparable to if I’d used format!("{:?}", …) to get a String.


In any case, I’d agree that it would be surprising if the output of Display for TokenStream ever changes, especially as it seems useful to use string outputs for writing test cases.

This whole reply is not supposed to be a counter-argument to your point at all, I was just wondering for myself ā€œdoes std really have no ā€˜problematic’ Display impls hereā€, found the linked ones[2] that do incorporate the Debug output of a nested value, and went from there :innocent:


  1. I assume you’d agree that Display implementors can ā€œopt outā€ of the default assumption of stability by documenting it ā†©ļøŽ

  2. by means of ctrl+F for ā€œ: Debugā€ on the docs of trait Display ā†©ļøŽ

1 Like

I believe it goes that, barring any further context:

  • Display::fmt gives a reasonably useful but unspecified string, unless further constrained. (explicit)
  • Error + Display should describe this error (but not its cause) in a way suitable for a log message. (semi implicit)
  • FromStr + Display should be formatted in a way that (mostly) round trips through the string representation. (implicit)

That last comes from ToString formatting to the type's default string representation, and FromStr parsing from the type's default string representation.

The formatting of numbers are not stable due to being Display, they're stable because the documentation for format argument syntax describes how numbers get formatted.

format_args!("{:?}", x)'s Display implementation is stable in that it calls x's Debug implementation.


For TokenStream specifically, it documents how its ToString is expected to behave. And it's that the presence of nonsemantic whitespace is unspecified and span/hygiene information is lost.

Punctuation tokens do record their spacing, which should be preserved if the next token is potentially joint with it, but there are some bugs where this isn't respected in 100% of cases.

So, how about ,? The output is similar to !. Do we have documentation that records how are these TokenStreams with punctuators structured when transforming to string?