Macro to modify string?


#1

Hi,

I am defining some multiline strings like this:

        let mut string = r#"
line2
line3
line4
"#.to_string();
        string = string[1..(string.len() - 1)].to_string();

Producing the same string as if I do the below:

        let mut string = r#"line2
line3
line4"#.to_string();

Because I prefer to use the first type of multiline strings, at least they are more comfortable. How can I reduce the code used in the first example -without using a normal function-? Maybe with macros? (I have never used them and I don’t know if they serve for this purpose).

I can imagine the final code like this:

        let mut string = newline_trim!(r#"
line2
line3
line4
"#).to_string();

Side question: Can macros modify the inner text of the strings? (imagine, changing to upper case the first character of every line)


#2

Are you looking for the str::trim() method? This would be used as multiline_string.trim().to_string().

Otherwise you can also use extension methods to add functionality to foreign types. The main idea is you create a trait with a method you want (e.g. fn trimmed(self) -> String), then implement that trait on the target type, in this case String.


#3

You might be interested in indoc.

#[macro_use]
extern crate indoc;

let testing = indoc!(b"
    def hello():
        print('Hello, world!')

    hello()
    ");

#4

Thanks, but it would be something like trim but without removing spaces and \t, just the first and last newline.


#5

Interesting, thanks.

Something like this, although not exactly this functionality. The important part is how they do the macro.

One question: Do you know if a macro is executed at compile-time? Or all the operations that the macro does (e.g. calling a function to remove the first and last empty newlines) are performed at runtime? In case it is done at compile-time, I suppose that when the compiler doesn’t know the string to do a macro like indoc! will be perform at runtime, doesn’t it?

Let me explain myself: what would be great is that Rust allows modify the string inside a macro at compile time, like if the programmer wrote that final string (after the macro is executed) originally, saving runtime computations. Can Rust do this? (something like constexpr in C++).


#6

I have seen that it can be achieved with simply this:

macro_rules! newline_trim {
    ($string: expr) => {{
        $string[1..($string.len() - 1)].to_string()
    }};
}

But I cannot guarantee that it accepts only str. I don’t know how to emit an error if other type is given.


#7

Compile-time evaluation is an area of active work in rust. Rust’s const is C++'s constexpr. (but Rust’s const is not nearly as powerful or widely usable yet)

A declarative macro like newline_trim merely expands to the given code, so that will be done at runtime.

A procedural macro like indoc also expands to code… but it can do basically whatever it wants in order to generate that code. It is implemented as a function that gets called directly by the rust compiler. That function parses the input string literal, calls the unindent function on its innards, and reconstructs a new string literal, all at compile time.


#8

Perfectly explained.

Last question: which is the part that is telling us that is a procedural macro? maybe the use of proc_macro_expr_impl! over the function indoc_impl, that itself contains the call to expand?


#9

Okay, so I lied a bit. The compiler doesn’t directly call the expand function. It calls indoc_impl.

But not the indoc_impl you see there. It calls another one. One that’s auto-generated.

…yeah.

…um, you might wanna sit tight. It’s a bit of a long story.

tl;dr: proc-macro-hack is beautiful and terrible.

How are procedural macros made discoverable to rustc?

A procedural macro crate looks like this:

  • under [lib] in Cargo.toml, you must have proc-macro = true so that the compiler knows it needs to load the crate at runtime. (I mean runtime for rustc, which is compile-time for our code)
  • you must have a pub function with a special annotation, and a certain signature.
    use proc_macro::TokenStream;
    
    // here's an example of an item macro or an expression macro
    #[proc_macro]
    pub fn my_fun_macro(input: TokenStream) -> TokenStream {
        /* code */
    }
    
    // here's an example of a custom derive macro.
    // The fn name doesn't matter.
    #[proc_macro_derive(MyCoolTrait)]
    pub fn some_arbitrary_name(input: TokenStream) -> TokenStream {
        /* code */
    }
    

With those two things taken care of, rustc will discover the procedural macros, and somebody who uses your crate can will be able to write my_fun_macro!{ } and #[derive(MyCoolTrait)] respectively to use them.

Ahhh, so it’s the #[proc_macro] on line 35 that’s important?

#[cfg(feature = "unstable")]
#[proc_macro]
pub fn indoc(input: proc_macro::TokenStream) -> proc_macro::TokenStream {
    expand(&input)
}

Nope! That one is annotated with a #[cfg] block that normally prevents it from being compiled.

In fact, the important thing here is the #[proc_macro_derive(indoc_impl)] that you don’t see on line 42.

The… uh… wait, what?

You don’t see it because it’s generated by proc_macro_expr_impl.

…uhhhhhhhhhh why?

Okay. Let’s take a step back.

You can’t actually use #[proc_macro] on stable rust yet. Back when proc macros were first introduced (in an update referred to as “macros 1.1”), only #[proc_macro_derive] was stabilized. (and with good reason; even to this day, #[proc_macro] macros are not yet production-ready).

This means that currently it is impossible to write procedural macros that use macro_name!() syntax.

B-b-b-but indoc uses that syntax!!! So clearly it is possible!

Yes. Anything is possible when you work hard enough.

Many people were dying to use procedural macros for items and expressions no matter the cost. And so this is why dtolnay created proc-macro-hack.

proc-macro-hack?

proc-macro-hack does beautiful and terrible things to make procedural macros work on stable rust.

Here’s what really happens when you call indoc!. I can’t make this stuff up.

You write:

indoc!{"
    howdy, world!
    "}

it expands to:

{
    #[derive(indoc_impl)]
    #[allow(unused)]
    enum ProcMacroHack {
        Input = (stringify!("
    howdy, world!
    "), 0).1
    }

    proc_macro_call!()
}

which, I’d like to add, is a completely valid enum.

This causes rustc to directly call the procedural macro annotated with #[proc_macro_derive(indoc_impl)], which was generated here by proc_macro_expr_item (which is part of the hack). The autogenerated code parses away all the useless stuff around the input tokens to find

"
    howdy, world!
    "

at which point the function here in indoc is finally called. This function outputs

"howdy, world!\n"

which the autogenerated function slips into a new macro for reasons I guess:

{
    macro_rules! proc_macro_call {
        () => {
            "howdy, world!\n"
        }
    }

    proc_macro_call!()
}

which finally reduces to

{
    "howdy, world!\n"
}

So… yep.


#10

which the autogenerated function slips into a new macro for reasons I guess

Derive macros must expand to items not expressions.

The following is an item. :+1:

macro_rules! proc_macro_call {
    () => {
        "howdy, world!\n"
    }
}

The following is an expression. :boom:

"howdy, world!\n"

If you enjoy macros that generate macros that generate macros that generate macros, it gets worse.


#11

IMPRESSIVE ANSWER. Thank you.