A simple solution to localize rust

I set up a project that explains this concept better, sorry this was so confusing: Internatal String Localization Database Management System

I am new to rust, not programming, and Localization is always been a pain, so the first thing I want to learn about is how to localize my app and documentation, but currently, I do not see a clear path, since the documentation using cargo doc also needs to be localized.

I found Redox-OS and found out they need a localization system, and this is where the problem comes in, looking at solutions, currently, they all require API breakage, so I think I have a better solution, but I need to find out if it is possible, and can it get support to make it happen.

I have the full text of what I will talk about below, here at my Github Repository

I want to focus on a way that does not require any API changes to use this system of Localization System for Strings, that works for cargo doc, as well as any rust app, with no API changes, does not seem possible, with the current methods, so a new method is required, so let me give an example.

When I declare a string that needs to be localized, I run into this issue, for example, I want to add a button that needs to be localized, but all my code looks like this:

let selector = Selector::from("button");

The word "button", is the only reference I have to this text that needs to be localized, and I do not want to break this API, by having to make it look like this:

let selector = Selector::from(localize_this("button"));

The drawback of this approach of calling a function to localize a string is that it has to make a call to it, every time it creates this object,

and what happens if it breaks, or takes to long,

not to mention this does not even help me with cargo doc,

so I have a totally different approach, that I would like to propose to the Rust Community.

In the example of let selector = Selector::from("button"); the compiler can follow the quoted text back to the variable it is assigned to, the compiler will overwrite a po file on a full build, for later processing, so all it needs to do is create the po file, and append to the po file, this does not seem like a hard function to write, and it does not require a lot of processing, so minimal impact on the compiler, and can be off by default, so there is no impact by default.

Normally these files are called a po file, and I see no reason to name it differently, just because the format is enhanced. This po file will have extra schema, besides the normal msgid and msgstr, we want to track this variable name, and the file and path it is in, so now I have all the data I need to localize my strings, and they are stored in maybe an XML file, with attributes for the data-types for the msgid, msgstr, path, and other properties you might want to track, to make it easier for the Translator, and programmers, to translate this later.

The compiler will need a switch to compile these po files (--po), into mo files, only I will not be using that strategy, instead, it will optimize it, removing redundant msgid, creating a dif file, that is compared to any existing dif file, so it can upgrade it, so you do not lose any translations already made, and it creates a new po file with this data, and using a code generator, it will create an rs file, with one public function, that is a mapped array, using the msgid as a key, as such, whenever the variable is read, it needs to read from this function, using the msgid as the key to the mapped array, this is a simpy getter function, and requires no code change in my example to work, it is all done by the compiler itself.

I was thinking this function library, can be a static or dynamic library, making it easy to localize, just name the files according to their country code, and you can unload, and reload it again to make changes, or to load another language at run-time.

In the link above, I talk about writing an app that will work from the command line, as well as a GUI, called Redox-Lingo, or in this case could be called Rust-Lingo, same concept. This app can read the po files, and allow translators to modify the strings, not the msgid, so I would recommend encoding them, for example, instead of using readable text like "button 1", "button 2", instead use "button_1", and "button_2", this makes it easier to map an array, or enumerate it, whatever is faster, and makes it easier to find, you could also make it ugly to make it even easier, for example, "LOCALIZE_button_1", now you can just grep "LOCALIZE_" to find everything needing translation (better yet have Lingo do it), for each language, by just changing the locale, so this can be automated by Lingo, creating a todo list. The Lingo app can compile the po, and reload the library.

This makes it easy, since the from the existing codes point of view, no changes are required, the compiler does all the linkage when you start up the app, so there needs to be an entry point where the compiler links in a call to a localization initialization function, that creates an init file, to track changes, this is the mechanism used to know what state the system is in, it first checks to make sure the library exists for the current locale if it does, it marks it online in the init file if it does not exist, it creates it, if there are no po files, it will create them, and give a warning about a new init, thus on startup, the app will always work. It can check for po file changes, and recompile them automatically, and it can even check a repository for changes, and download them if required, so your system is always up to date at run-time.

This concept requires changes to the rust compiler, to create the po files, and create the init call, plus it needs to change the way it reads from the string variables marked as localized, as well as a way to mark them, or you can just localize all strings, you would have to test the performance on localizing all strings, any untranslated string, will be the original msgstr, which is also the msgid, so unless a translator changes it, it will remain the same, so maybe comments should also be saved with the po file, for example:


// Lingo:Off This is a Lingo comment, do not translate this button

let selector = Selector::from("button");

The compiler can then test for "// Lingo:", if it exists, it outputs it as a property to the po file for this variable, making it easy for everyone to localize.

You will want to standard the comments for example:

Do not localize
// Lingo:Off And a comment that gets ignored after off
Localize
// Lingo:On And a comment that gets ignored after on 

You can add more rules as needed.

The advantage of this concept is that it makes no changes to any API, and only the compiler needs to be enhanced, giving the advantage, that all the strings are typesafe, and the getter function is in a static, or dynamic library, which is ever best, so it can be optimized, and now it works for all rust apps, and cargo doc.

I need help building a proof of concept for this method, the compiler can be standalone, it actually does not need to be in the rust compiler, so it is just a matter of creating a compiler app and trying to get support for it.

I call it a compiler, it is actually just a parser, I was thinking the main rust compiler could create the po file, but there are other ways, all you need to do is find the path to the string, and look for a comment on it, then it is just a matter of writing a code generator, to create the getter, and it can call the rust compiler to compile it.

There might also be a way to deconstruct the compiled code using the debugger to traverse the stack, looking for what you need, then outputting it.

If we can get one of the compiler team members to point me in the right direction, so I do not have to plow through all the code, searching for how to do this, at the same time ask them if they can explain how to output this information to a file, that would help get this project started.

Update: The Localization System that I am describing, does not actually do any Localization, so you can not compare it to projects like Fluent, gettext, or any other system like them, this is not that type of system, the goal of this project is just to create the po files, and by po files, I am not referring to the ancient po file format, but a better format, with more information, I only use the term po, so that you get the idea that it is an external file, with a msgid, msgstr, variable name, file name, and file path, so that any back-end, like Fluent, gettext and so on, can use this file, to do that translation, in this case, I actually want to create a project called Lingo, that does this, I hope I am making myself clear.

When you are writing an application, it would be nice not to have to worry about Localizing, and with this system that is possible, this is not a replacement for other localization systems, it is a system designed to create files that need to be translated, with 100% coverage, without the need for the programmer to have to code each string that needs to be localized.

By making the localization a static or dynamic library call, you do not have the need for any other system to translate the string, that is done at compile-time, and the strings are loaded by the library, that can be reloaded at run-time, making it easier to update than most systems, and this is low level, the only change to the rust system, would be a call to the init function at startup, and use the libraries getter for all strings.

The other benefit to this system, over other systems is that the rust compiler can use it, making it localized with no effort, in fact, it requires no effort since it is this low level, it is just part of the language, something you do not have to think about once you get it to work.

The reason there are as many Localization projects as there are right now is because they are required, and why, because Rust does not have it built-in, and that is what I am asking to do, instead of making Localization an afterthought, or a second-hand citizen, that has to be handled by the programmer, that is not a good way to localize, letting the compiler do it is.

Thanks

Flesh

1 Like

You make it sound very simple, but I do not think this would be an easy task considering that Rust's syntax is quite involved, and changing the actual rust compiler would probably be even more work that I doubt the compiler team would be interested in working on.

I edited my text, I see your point, this is not a project for the compiler team, it is a simple parser for the compiler, since it already does this, so you can use its code to do this.

Po files are the ancient way of handling translations. Nowadays we should Fluent instead, which does have an existing Rust library that can be easily integrated into any project. No compiler macros necessary.

1 Like

Yes po files are ancient, but so am I, I was just using po file as an example, the file format will be different, but Fluent requires API changes, and does not work for cargo doc as far as I know, and those are the two things I am trying to target. I will edit my article to explain this better, check out bottom under update.

Automatically localizing all strings sounds very weird. If anything, this should be a proc-macro that wraps some unit of code well-visibly in order to explicitly localize strings in it. Although I don't even understand how what you described as the old/classic approach requires breaking APIs. Wouldn't localize_this() still just return the localized string for a given key?

Which reminds me of a much more principled idea: for the Swift language (at least when running on the Apple ecosystem), there's a project called SwiftGen which parses uses of localized strings and generates type-safe data from it, which in turn can be used for accessing localizations with a much lower risk of mistakes.

Localizing all the strings is wierd, and why I brought up using comments to disallow it, and you could use comments to allow, and have it off by default, I will edit it, to make this clearer.

The biggest problem with any system, is how to get the data strings to localize, the concept here, is not how to localize them, but how to get the data, how you access it in rust can be done using a code generator, and the actually file it creates, would normally be edited by the Translation team, or programmers, and if you do not need to translate a variable, just remove it from the file, and it will just return the default value, which is all it does anyway, and this is a type-safe function, the code generator ensures that.

The concept here, is to move accessing string data, to a library call, I have no idea how Swift works, but this concept is to be able to create this library, compile it, if it fails, it is not type-safe, so you have a check built in, such that the key returns either a localized string, from a compiled library, or the default value, and since the library can be changed either at run-time, or at startup if you use the locale.

The drawback of this approach of calling a function to localize a string is that it has to make a call to it, every time it creates this object

Have you considered having your functions take <T: Into<LocalizedStr>> (where LocalizedStr is some trait you wrote yourself)? Then it would be just as easy from the caller side without having to make any major changes.

not to mention this does not even help me with cargo doc

Why do you want to localize cargo doc?

Maybe we should consider not only the texts. In European languages characters are often written left to right, but texts in some languages are written right to left or even up to down (like traditional Mongol script, sometimes literal Chinese or Japanese), their websites are somehow specially designed to fit its text writing direction.

When you run cargo doc it generates html documents, but they are not localized.

If you compile it, and link it as I suggest, it will create them in what ever language your locale is set to.

I think most people are missing what I am trying to do here, the compiler creates an intermediate file, a debug file, a release file, and so on, so it is not that big of a deal having it write to another file.

Using macros means more programming, my method requires no changes to any API.

You are talking about localizing to a specific language, that is not my intent with this project, that is done a different way, and at the Wiki, I will discuss it in more detail, what you are talking about is more about how data is displayed, all I am talking about is creating a database of all the variables in an application that require localization, and the two are not really related, since all I am concerned with for this project, is creating a file that can be localized, how you localize it is another subject, but this project does not deal with displaying localized data, it only deals with creating a link from the msgid to the msgstr, but at the link to the wiki, there is a command line, and desktop app I am working on, that deals with what you are talking about, but I have not written it yet, but you can see what I did write.

2 Likes

Oh, I misunderstood. Thanks and... sorry :slight_smile:

1 Like

Not to mention that with some sort of code generation it's trivial to generate localized string literals, then map them to their respective keys — no allocation is needed. Even if more dynamism is required (eg. changing locale/language on the fly, while the process is running), it's possible to just cache localized strings and Box::leak them to get a 'static str

I'm sorry but I have no idea what you are trying to say with the 2nd and 3rd paragraph. Could you please rephrase it?

Sorry about the confusion, I think I need to write it up better, but what I am trying to do in rust, I have done in other languages, in some I used reflection, others debuggers, but I always found a way to create the po files, grant you that is the old way of translation, back then it was the only way, but you get the idea, not sure if rust has reflection, and I have not yet got into debugging it, but I never like either of those ways for many reasons, so one day I looked at how the compiler made files, disassembled them, and found the function that wrote the symbol table, and inserted a function that filtered and collected the data and wrote it to a file, from that file, I was able to write another app to parse it, looking for tokens, and that created the actual po file, and flagged variables requiring translation, and that is what I am trying to do now, just collect all the string data, without any API changes.

The system I was working on was old, and it had millions of lines of code, in thousands of files, and they wanted me to localize it, but the code was still in work and always changing, and then you had to keep up with those changes, and track ever change, so I came up with this system, and it worked the way I wanted it to, any code changes requiring translation would be easy to find and upgrade.

The Translation system I am working, uses a database structure, and reads in these files and processes them, and will have a command line and GUI editor, and can do the code generation.

I am open to better ways to accomplish this, but I am trying to do it at the compiler level, and not at the app level, such that the rust compiler is localized, and cargo doc can be localized, without having to write the code in the app, other then comments, and msgid, like I tried to explain.

1 Like

I took everyone feedback and came up with a different solution, I think this one is more flexible.

Rust Lingo