The internationalization of Rust itself

Hello, I first went to IRC where I was advised to put my question here by @keeper after some replies by @oli_obk.

Here was my original message:

Hello, I'm interested in adding translation facilities in programming languages, that is, letting programmers using misc. languages. For example Babylscript is a proposal in this direction for Javascript, although it's not completely what I would be interested with.
So my first question is, would the Rust community be interested with evaluating the addition of such a facility into Rust, or would it more like reject such an idea?

The IRC conversation was also fed with the following answers:

oli_obk_: My personal opinion is that this isn't something that should be supported by the language, but by the IDE. Transparent translation could just be a display thing, not something that ever appears in the compiler.
of course the compiler could do steps to make transparent translation easier

psychoslave: thank you oli_obk_ for sharing your opinion. The problem I see with a mere IDE support, is that it lakes flexibility, although I'm not sure what you have in mind exactly. The thing is to me it sounds like a "transplier" approach, which have many caveats, especially when debugging comes into consideration.
But maybe it's also why you added the memo about compiler additional steps

oli_obk_: with the language server protocol IDEs already have much more semantic information about a piece of code than just the textual representation. I think that should be part of the discussion for translations

I'm not aware of any satisfynig solution using an IDE such @oli_obk is suggesting, and I'm not familiar with the Rust compilation chain (yet).

Although I'm not fundamentally against IDE, I used some of them for some times, including Eclipse, Netbeans, Atom, and so on. But currently I am mainly using (neo)vim and ack as my main daily tools to hack around. So whatever the suggestion might be, I'm open to it as long as it doesn't rely on a specific IDE and let anyone edit code using any environment, user language included.

Btw you can edit messages. It is easier for people to follow the conversation when you post everything in the "main post" rather than main post + successive replies

1 Like

Java to Kotlin translation in IDEA works pretty great.

1 Like

Thank you @matklad I didn't know Kotlin actually. It looks like it actually have more than just IDEA integration in fact, as it can be parsed to both JVM bytecode and Javascript, plus it has some support for LLVM pipeline.

But actually what I have in mind is more something like Perligata which enable to use some kind of Latin dialect to code in the Perl environment. For international collaboration, I would be more interested with an Esperanto dialect.

Just out of curiosity, do people really want to use multi-lingual programming languages? For me it seems, that such a thing would make life harder then just learning the few keywords (which are similar in all programming languages). The main problem I see is that talking with others and getting help for example would become much harder, because every time one needs to translate forth and back into/from the English version of the language. And getting help on IRC or here, is a very fundamental thing in the Rust community. I am from Germany, but I would never ever consider to use a wenn - dann conditional statement, for example. The only 'language' where I have seen this in real life is in Excel formulas and it comes exactly with the problems I have described above and some inconsistencies between English and German version if I remember correctly. :frowning:

10 Likes

I'm very skeptical about internationalization of language's keywords. If the goal is to make the language usable to people who don't know a word of English, then you'd also need to translate stdlib names and other libraries to match.

But if you don't manage to translate all of Rust code that user interacts with, then you're making it worse for the user: they will have to learn both English and their own language Rust, and mapping between them.

So I think it's fine to leave language keywords as they are. Many of them are abbreviations and domain-specific jargon, so may not make much sense to a novice anyway (e.g. Polish i32 would be c32 — not any clearer. And "else" is "w przeciwnym wypadku", so it's unusable and you'd have to come up with a domain-specific alternative anyway.)

At school I had a polish version of LOGO, and it was awful. I couldn't learn from books about English logo. I had to use NAPRZOD instead of FWD. Also Polish grammar has inflection, so the language gets doubly weird when forced to fit inflexible programming language's grammar (e.g. variable names would have to have different suffixes depending on gender and tense of context they are used in).

I've also interacted a lot with Polish dev community, and there everyone has just memorized what if is. Programming languages have only few keywords, so this really isn't a big barrier. Many Polish words related to technology are just copies of English ones (komputer = computer).

The real barrier is the prose in the documentation that you can't crack just by learning a dozen of keywords.


Translations of documentation and error messages would help immensely.

Rustc aleady has macros for the error messages, so this could be extended to have hooks for internationalization.

Rustdoc may be a tricky problem, because you wouldn't want to have a dozen versions of a doc comment before every method, so it may need a way to add other languages from external files.

16 Likes

I think it is a good idea. Rust has verbose error messages and some might not understand the message. The problem is a lot of stuff might get lost in translation

1 Like

Just out of curiosity, do people really want to use multi-lingual programming languages?

Yes there are, otherwise this thread wouldn't exist. :slight_smile: Actually it seems there are even people interested in mono-lingual programming languages based on [non-English languages](Non-English-based programming languages). Although, this is not the approach I'm looking for, and an internationalized compilation stack would be far more interesting as far as I'm concerned.

that such a thing would make life harder then just learning the few keywords (which are similar in all programming languages).

It all depends the use case. For example you probably wouldn't use scratch to build your new OS, but it allow good introduction to programming with an interface whose translation is rather obvious.

All the more, the few keywords are just the flake on the edge of the iceberg. Otherwise, just letting user alias every keywords would be enough. Actually, a feature as simple as aliasing would already a huge step, but few language enable an extensive aliasing which include reserved keyword aliasing, plus the question of whether you can otherwrite or not an existing reserved keyword.

An internationalized programming language should lead to the same AST whatever the lexicon you are using. Surely it should be even possible to provide a tool which can provide high level translexicalisation of any localized Rustacean source to any other localized Rustacean source when the environment has the variable "LANG" set to it, including "DE" and "EN" (or possibly "C").

The main problem I see is that talking with others and getting help for example would become much harder, because every time one needs to translate forth and back into/from the English version of the language.

It all depends of the language community size. The hypothesis under the previous claim is a chicken or egg problem. You only find helpful resources in English because only English is used for that kind of work so you will use English too.

And getting help on IRC or here, is a very fundamental thing in the Rust community.

That's fine and I see there are some canal dedicated for languages other than English. For those who can speak English, it's fine to be able to find a community willing to help you, and if you don't, it's still fine if you can get help.

I am from Germany, but I would never ever consider to use a wenn - dann conditional statement, for example.

Then it's good for you. But maybe there are people out there who don't speak English and would like to resolve programming problems without having to add the burden of learning such a complex speaking language as English additionally to the one of learning to program and develop a programming solution.

The only ‘language’ where I have seen this in real life is in Excel formulas and it comes exactly with the problems I have described above and some inconsistencies between English and German version if I remember correctly. :frowning:

Well, that's why a single specified AST target is important. Without more precise information on this inconsistencies, it's hard to utter anything really more relevant.

Thank you for sharing you feedback and concern @kunerd

1 Like

@kornel

If the goal is to make the language usable to people who don’t know a word of English, then you’d also need to translate stdlib names and other libraries to match. […] But if you don’t manage to translate all of Rust code that user interacts with, then you’re making it worse for the user:

I agree, just like for any software interface, the fuller the translation, the better. Thus said, you can prioritize string translations by frequency of use, so hopefully most users won't fall on untranslated strings, and users using more advanced features are more likely good potential translators.

Many of them are abbreviations and domain-specific jargon

Actually, this one could also be changed in some locales, so you could have a "EN-pedantic" locale which translexicalize i32 to thirty_two_bits_integer (I guess).

Also Polish grammar has inflection, so the language gets doubly weird when forced to fit inflexible programming language’s grammar

Well, most programming languages don't manage English inflections like "person/people" either. Does Rust take that into account? If it does, then surely you should reuse this facility in a Polish localization of Rust. Otherwise the possibility to add this feature at some level on the Rust stack could be discussed, or people might be told that due to some technical decision inflection are not possible in the programming language. By the way, there are programming stack out there like Ruby on Rails which do manage inflections like "person/people". All that, again, only requires some alias feature.

Now for really more flexibility with rule based inflections and even syntax customization, that would require DSL facilities.

The real barrier is the prose in the documentation that you can’t crack just by learning a dozen of keywords.

I agree that is an important barrier. Although, to my mind, not translating this keywords and not even integrating internationalization facilities send a strong signal that translation are unwelcome.

Translations of documentation and error messages would help immensely.

Did you consider adding the translatable material to translatewiki?

Rustc aleady has macros for the error messages, so this could be extended to have hooks for internationalization.

I'll be interested by some links on the topic and some development of your idea how to extend that facility. :slight_smile:

Thank you for your insights.

Dear @sebasmagri, @ag_dubs told me to contact you about this topic in a conversation on IRC #rust-lang:

ag_dubs @psychoslave the community team is talking about internationalization you might ping @sebasmagri in #rust-community- we're mostly focusing on docs and other translation of support materials but this might be an interesting point to bring up to him

Unfortunately, my attempts to join the #rust-community channel all failed, both with my native client and through mibbit.

So I let you consult both this thread and the IRC log of #rust-lang to document about this topic. Let me know if I can help you on any way of if you have any question.

You might need to register an account on IRC to be able to join the channel.


In general while I'm very much for having localized error messages and such in the compiler; I don't think localizing keywords is something we should support. Other languages use forks for this (GitHub - ChimeraCoder/koro: A Bengali (বাংলা) version of the Go compiler and toolchain)

Did you consider adding the translatable material to translatewiki

This kind of requires some thought; because we need to properly decide on how to internationalize things. Pluralization itself can be very painful here, and there are a bunch of other things to deal with. Any internationalization efforts will need some focused effort put into them, not just "put strings up for translation and cobble together an internationalization framework" (that always ends up with problems)

That said, we should try to work on a compiler internals RFC that gets us to that state where we have a good i18n framework within the compiler.

I wish we could bring the discussions here instead of directing people elsewhere. What say?

I have to agree with @kunerd — I think that Rust is already quite complex, we should not add unnecessary complication to it, it would just result in compiler bugs in exchange for a tiny, short-lived convenience for a minute fraction of people. The language being smaller, more uniform, more consistent, and not containing Every Currently Fashionable Minor Feature Ever™ is one primary reason people choose Rust over C++, for example.

The few dozen English keywords are not hard to learn, and quite frankly I would be very upset if I had to read code with non-English keywords. Consider what happens, for example, when you are trying to read the code outside of your IDE, in plaintext, or online in a Git repository. For example, you are on your way, when someone asks you to review some code. Or you need to temporarily work on a machine with only a basic text editor. Now your IDE isn't at hand to translate the "Rust" code from their native language — which you have no idea how to read! This means that you can't read and review the code, maybe you'll waste the other party's time since they will have to wait, or they'll commit in the code unreviewed, leading to bugs…

My ultimate point is: we should try to unify communication, that's how people all over the world can co-operate on projects. And that is done by agreeing on a common language, not by splitting code up into several sub-languages or dialects.

For example, in Haskell, where everyone invents their own clever operators and basically every individual writes in a different domain-specific language, it's a pain to read others' code, it's a disaster. The concept of "translating" a programming language was always puzzling to me, it simply doesn't seem to make much sense. There's no real benefit to it, and it would just complicate matters.

As you can probably tell by now, I'm not a native English speaker, but I firmly believe that even with a basic level of English knowledge, it's perfectly possible to get started with an English-based programming language. I was 9 when I was getting started with coding in BASIC, and it was the very same year that I first learned English in school. So I wasn't a fluent user at the time — yet I found English keywords the least hard part of learning to program. And if one is a seasoned programmer, the principle applies to them to an even higher degree.

To add to this, I can tell two related personal stories. The first one is: I once had to work on a Linux-based system that somehow ended up with a GCC installation localized to French. The damn compiler was spitting out error messages in French and I have no idea what they meant, it was extremely frustrating.

The other story is: the university I went to (and am currently teaching at) has a strange, in-house educational programming language that uses keywords and type names in Hungarian, which is our native language. This is supposedly done so that "students find it easier to get started". But students hate it passionately. They always complain about it, and there's nothing I can do because it's faculty policy to use that language in the first semester of Intro to Programming.

16 Likes

Nice to discover Koro, thank you for the link. It's good to have the fork option, that one of the great advantage of libre software. But if it can be avoided, it's even better, as forks come with their own disadvantages. I don't think I need to

If possible, it would be far more resource efficient and user convenient to put some facilities right into the official toolchain and have a locales/ directory, so that adding a new Rust locale is basically providing a set of tranlated files. I'm fully aware that mere string literal translation is not enough for a complete internationalization, I even wikified a course on internationalization (in Esperanto). So I'm not completely ignorant on this topic, and would the Rust community be interested, I would be happy to help as much as I can on this point.

Well yeah. But at the same time a user shouldn't get stuck because he can't understand what the error means.

Are you saying that a compiler error message doesn't/shouldn't convey necessary information? That's some very strange thinking. Anyway I didn't write "I got stuck" — I wrote that it was frustrating, which it was. I did eventually figure it out somehow, but why would we want to do this in a new language? It's just bad.

Rust is a language obsessed with programmer ergonomics, a.k.a. "doing things the right way this time", since that is a big part of making less mistakes and writing safe, correct code. We shouldn't purposefully open up opportunities for confusion.

You should be able to use "LANG=C" before any comment to fall back to the default system language, which is generally equivalent to "LANG=EN-US". :slight_smile:

It's fine that you prefer to use English lexicon in your code, and this discussion is not about removing such an option. Actually I think that integrating feature for Rust internationalization would avoid scattering of effort in projects like the in-house educational language. Plus it would make transposition of concepts to an English variant even more straight forward, since it would basically be the same language with an other thin layer of lexical-sugar. So it would let users who prefer to use an other work speaking-language take this option, while letting a neat bridge to move to English (or whatever target language).

The problem of convention illustrated with Haskell is only weakly related to this internationalization proposal. In fact, although I'm not knowledgeable of the Haskell community, I guess all the DSL mentioned will equally use English in their development, don't they? So actually, it could perfectly be argued that including official locales will reduce this kind of proliferation with a clearly given conventional way to develop localized code.

I'm sad to hear that you are full of hate for diversity, and if this is the general opinion of the Rust community, I won't fight this hostility against freedom of people to express themselves with the lexical inventory they wish.

In the #rust-community IRC channel logs I saw that @geraldobarros, @KiChjang and @booyaa might also be interested by this topic.

Cheers

I think you misunderstood my point. I never said we need to convey less information. I said they should be an "option" for a user - either in a core or even in a separate crate - that allows him to read the error messages in a language he prefers.