static INVALID_TOKEN_MSG: &str = "'{}' is not a valid token";
fn main() {
let msg = String::from(format!("'{}' is not a valid token", "behai"));
println!("1. {}\n", msg);
let msg = String::from(format!(INVALID_TOKEN_MSG, "behai"));
println!("2. {}\n", msg);
}
The compiler rejects this line: let msg = String::from(format!(INVALID_TOKEN_MSG, "behai"));.
Is there a way in Rust I can do such format, please? I do understand that &str is a reference to a string on the heap.
This practice is common in languages such as Delphi and Python. Also, this call to the format! macro would get rejected too:
let fmt_str = "'{}' is not a valid token";
let msg = String::from(format!(fmt_str, "behai"));
What are the differences between passing the literal string "'{}' is not a valid token" directly onto the format! macro and passing in fmt_str ( and gets rejected ), please?
format! needs to know the contents of the format string because it's whole job is to take that string and break it into Rust code that implements the formatting. Edit: Macros generate code, and thus must be expanded at compile-time. Because they can generate new names for things, they have to be expanded pretty early on.
Macros are expanded at a point in compilation when the compiler literally does not know what INVALID_TOKEN_MSG means. They are expanded before name resolution and type resolution, so macros also cannot use type information (not relevant here, but good to know).
To put it another way: the only information macros have is the literal sequence of tokens you pass them. The format! macro can see the token INVALID_TOKEN_MSG, but has absolutely no way of finding out what it means.
The only kind of format strings format!, println!, write!et al can support are string literals[1].
Besides directly inserting a literal, they can also support macros. It's a bit of a magical ability that they're able to expand other macros and then inspect the result.
*slaps forehead* Quite right. Forgot about that. This is why I hate magic: it sits there, lurking in the undergrowth, waiting for you to become complacent and then *bam* you're wrong on the internet.
For context for less experienced Rust users (in terms of macro writing experience): the thing that's "magical" here is that user-written macros (be it macro_rules or procedural macros) don't have this ability at all. At least as of now - perhaps they'll gain it in the future. For now, it's thus impossible to write your own custom println! alternative with fully equivalent functionality (but, say, an alternative format string syntax).
What makes gettext really nice is that the source of truth is in the source code, typically in C annotated with a underscore: printf(_("The answer is %d\n"), 42) . At runtime _ will resolve the string based on locale.
At development time, a tool extracts the translatable strings from the code into a catalog file. Additional tooling allows for merging changes into translations when the source updates etc. It is actually a very well thought out system for traditional desktop apps, though it is typically limited if you want to serve many different languages at the same time (e.g. on the Web).
This seems - potentially - a bit unsafe; the number and/or type of formatting placeholders might not match the actual data provided. Is this problem addressed at all in the C solution?
Absolutely not and it's actual practical problem: when translators break format strings programs are crashing in runtime and it's hard to notice that because developers rarely use foreign translations during development.
Well, it is C, what do you expect? That said there are some provisions:
If you change the format string it will be detected and translations will have to be updated (it won't load the old string).
I also seem to remember that the tooling (msgfmt i think? It has been over a decade since I worked on a project with translations of this type) detected and errored on incompatible C style format strings between the extracted catalog and the translations when you go to compile the translations into binary distributable files. GNU printf also supports reordering format arguments using some extension syntax (i.e. when the target language needs a different word order).
That said I'm sure you can still mess it up if you try (the compiled binary catalogs are trusted, which makes sense since they traditionally get installed into /usr/share which only root can write). But at least they gave some thought to messing up by mistake.
I also don't think there was any provision to prevent loading a binary catalog file for a different version of the binary. So that is another potential issue.
None of these issues are however fundamental to the overall approach. You just need to have a properly thought out system that does the required validation at the various steps (loading the translations, compiling them, etc).
Yes, I’d say that directly using println! (and related) macros in the same style is going to be impossible. That being said, there’s likely lots of ways the limitation could be worked around by a dedicated library author – and if the whole thing is supposed to be safe, they need to write incorporate mechanism to verify things like argument counts, anyways. I don’t think it’s a big loss if this means you’d have to write custom_println!("foo {} bar {}", x, y) instead of println!(custom_macro!("foo {} bar {}"), x, y) – there’s not much value in having it visually clearly be the original println!.
It knows C format strings and thus can verify if strings have different placeholders, though.
All in all gettext is pretty cool tech, but I'm not sure it can be adopted by Rust without full rewrite.
Also: it doesn't work well with Android/macOS/Windows which means in today's world it's more of a curiosity rather than something you may seriously decide to use.
P.S. It's really funny how an attempt to push “all the software must be free” agenda killed quite promising and interesting technology, but I would say that at this point gettext is pretty much dead.
No, that's false. References don't have a concept of "the heap" or "the stack". A &str may point anywhere. In your very own example code, they point inside static memory that's baked into the executable.
The way I like to think about it, Rust format strings are code in a special sub-language. If you want to separate code, regardless of what it is, you do it with a function. That is, your original example can be:
fn invalid_token_msg(token: impl fmt::Display) -> String {
format!("'{token}' is not a valid token")
}
let msg = invalid_token_msg("behai");
println!("2. {}\n", msg);