Format using global string constants, please help

behai · August 16, 2023, 2:10pm

Hi,

Please see this short program:

static INVALID_TOKEN_MSG: &str = "'{}' is not a valid token";

fn main() {
    let msg = String::from(format!("'{}' is not a valid token", "behai"));
    println!("1. {}\n", msg);

    let msg = String::from(format!(INVALID_TOKEN_MSG, "behai"));
    println!("2. {}\n", msg);
}

The compiler rejects this line: let msg = String::from(format!(INVALID_TOKEN_MSG, "behai"));.

Is there a way in Rust I can do such format, please? I do understand that &str is a reference to a string on the heap.

This practice is common in languages such as Delphi and Python. Also, this call to the format! macro would get rejected too:

    let fmt_str = "'{}' is not a valid token";
    let msg = String::from(format!(fmt_str, "behai"));

What are the differences between passing the literal string "'{}' is not a valid token" directly onto the format! macro and passing in fmt_str ( and gets rejected ), please?

Thank you and best regards,

...behai.

DanielKeep · August 16, 2023, 2:28pm

format! needs to know the contents of the format string because it's whole job is to take that string and break it into Rust code that implements the formatting. Edit: Macros generate code, and thus must be expanded at compile-time. Because they can generate new names for things, they have to be expanded pretty early on.

Macros are expanded at a point in compilation when the compiler literally does not know what INVALID_TOKEN_MSG means. They are expanded before name resolution and type resolution, so macros also cannot use type information (not relevant here, but good to know).

To put it another way: the only information macros have is the literal sequence of tokens you pass them. The format! macro can see the token INVALID_TOKEN_MSG, but has absolutely no way of finding out what it means.

The only kind of format strings format!, println!, write! et al can support are string literals^[1].

Unless there are radical overhauls either to how format! is implemented internally to give it super-powers no other macro has, or there are radical overhauls to the whole compiler to allow different parts of the code to be in different states of compilation. I wouldn't hold your breath on either of those happening any time soon. ↩︎

steffahn · August 16, 2023, 2:32pm

Besides directly inserting a literal, they can also support macros. It's a bit of a magical ability that they're able to expand other macros and then inspect the result.

DanielKeep · August 16, 2023, 2:33pm

*slaps forehead* Quite right. Forgot about that. This is why I hate magic: it sits there, lurking in the undergrowth, waiting for you to become complacent and then *bam* you're wrong on the internet.

(Still glad it can do that, mind. :P)

steffahn · August 16, 2023, 2:36pm

For context for less experienced Rust users (in terms of macro writing experience): the thing that's "magical" here is that user-written macros (be it macro_rules or procedural macros) don't have this ability at all. At least as of now - perhaps they'll gain it in the future. For now, it's thus impossible to write your own custom println! alternative with fully equivalent functionality (but, say, an alternative format string syntax).

Vorpal · August 16, 2023, 2:48pm

Does this also mean that it is impossible to support gettext style translations for formatted strings? That seems like a big limitation.

steffahn · August 16, 2023, 2:59pm

I’m unfamiliar with the term “gettext style translations”.

Edit: Seems to be about localization. I’ll look into it.

Vorpal · August 16, 2023, 3:15pm

Indeed it is about localisation (or localization depending on which localisation you use ;)).

If the string can be replaced at runtime (depending on the selected language) it is in conflict with needing the format string at compile time.

I started looking into it for rust, seems at least some libraries provide their own format macro (e.g. GitHub - woboq/tr: Translation tools for rust).

What makes gettext really nice is that the source of truth is in the source code, typically in C annotated with a underscore: printf(_("The answer is %d\n"), 42) . At runtime _ will resolve the string based on locale.

At development time, a tool extracts the translatable strings from the code into a catalog file. Additional tooling allows for merging changes into translations when the source updates etc. It is actually a very well thought out system for traditional desktop apps, though it is typically limited if you want to serve many different languages at the same time (e.g. on the Web).

steffahn · August 16, 2023, 3:24pm

This seems - potentially - a bit unsafe; the number and/or type of formatting placeholders might not match the actual data provided. Is this problem addressed at all in the C solution?

khimru · August 16, 2023, 3:25pm

Absolutely not and it's actual practical problem: when translators break format strings programs are crashing in runtime and it's hard to notice that because developers rarely use foreign translations during development.

Vorpal · August 16, 2023, 3:32pm

Well, it is C, what do you expect? That said there are some provisions:

If you change the format string it will be detected and translations will have to be updated (it won't load the old string).
I also seem to remember that the tooling (msgfmt i think? It has been over a decade since I worked on a project with translations of this type) detected and errored on incompatible C style format strings between the extracted catalog and the translations when you go to compile the translations into binary distributable files. GNU printf also supports reordering format arguments using some extension syntax (i.e. when the target language needs a different word order).

That said I'm sure you can still mess it up if you try (the compiled binary catalogs are trusted, which makes sense since they traditionally get installed into /usr/share which only root can write). But at least they gave some thought to messing up by mistake.

Vorpal · August 16, 2023, 3:36pm

I also don't think there was any provision to prevent loading a binary catalog file for a different version of the binary. So that is another potential issue.

None of these issues are however fundamental to the overall approach. You just need to have a properly thought out system that does the required validation at the various steps (loading the translations, compiling them, etc).

steffahn · August 16, 2023, 3:37pm

Yes, I’d say that directly using println! (and related) macros in the same style is going to be impossible. That being said, there’s likely lots of ways the limitation could be worked around by a dedicated library author – and if the whole thing is supposed to be safe, they need to write incorporate mechanism to verify things like argument counts, anyways. I don’t think it’s a big loss if this means you’d have to write custom_println!("foo {} bar {}", x, y) instead of println!(custom_macro!("foo {} bar {}"), x, y) – there’s not much value in having it visually clearly be the original println!.

khimru · August 16, 2023, 3:47pm

Actually it would except if you would use -N option to prevent that.

It knows C format strings and thus can verify if strings have different placeholders, though.

All in all gettext is pretty cool tech, but I'm not sure it can be adopted by Rust without full rewrite.

Also: it doesn't work well with Android/macOS/Windows which means in today's world it's more of a curiosity rather than something you may seriously decide to use.

P.S. It's really funny how an attempt to push “all the software must be free” agenda killed quite promising and interesting technology, but I would say that at this point gettext is pretty much dead.

H2CO3 · August 16, 2023, 3:51pm

No, that's false. References don't have a concept of "the heap" or "the stack". A &str may point anywhere. In your very own example code, they point inside static memory that's baked into the executable.

kpreid · August 16, 2023, 4:35pm

The way I like to think about it, Rust format strings are code in a special sub-language. If you want to separate code, regardless of what it is, you do it with a function. That is, your original example can be:

fn invalid_token_msg(token: impl fmt::Display) -> String {
    format!("'{token}' is not a valid token")
}

let msg = invalid_token_msg("behai");
println!("2. {}\n", msg);

behai · August 17, 2023, 12:04am

Thank you very much Daniel for the information. I have read about marco expansion, but I could not make the connection in this case.

I appreciate your helps very much.

Thank you and best regards,

...behai.

behai · August 17, 2023, 12:06am

Hi H2C03,

Thank you for the correction. I appreciate that.

Best regards,

...behai.

behai · August 17, 2023, 12:09am

Hi kpreid,

Thank you for your helps. I am going with the example codes you provide.

Thank you again and best regards,

...behai.

system · November 15, 2023, 12:09am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
String literal question about macro	8	314	June 24, 2023
String literal equivalence help	8	438	January 2, 2021
How to create multiple formatted string without repeating the format string? help	3	668	July 29, 2020
How can i solve: format argument must be a string literal help	6	14984	December 13, 2019
Format macro question	7	1436	January 12, 2023

Format using global string constants, please help

Related Topics