The internationalization of Rust itself

Just out of curiosity, do people really want to use multi-lingual programming languages?

Yes there are, otherwise this thread wouldn’t exist. :slight_smile: Actually it seems there are even people interested in mono-lingual programming languages based on [non-English languages](Non-English-based programming languages). Although, this is not the approach I’m looking for, and an internationalized compilation stack would be far more interesting as far as I’m concerned.

that such a thing would make life harder then just learning the few keywords (which are similar in all programming languages).

It all depends the use case. For example you probably wouldn’t use scratch to build your new OS, but it allow good introduction to programming with an interface whose translation is rather obvious.

All the more, the few keywords are just the flake on the edge of the iceberg. Otherwise, just letting user alias every keywords would be enough. Actually, a feature as simple as aliasing would already a huge step, but few language enable an extensive aliasing which include reserved keyword aliasing, plus the question of whether you can otherwrite or not an existing reserved keyword.

An internationalized programming language should lead to the same AST whatever the lexicon you are using. Surely it should be even possible to provide a tool which can provide high level translexicalisation of any localized Rustacean source to any other localized Rustacean source when the environment has the variable “LANG” set to it, including “DE” and “EN” (or possibly “C”).

The main problem I see is that talking with others and getting help for example would become much harder, because every time one needs to translate forth and back into/from the English version of the language.

It all depends of the language community size. The hypothesis under the previous claim is a chicken or egg problem. You only find helpful resources in English because only English is used for that kind of work so you will use English too.

And getting help on IRC or here, is a very fundamental thing in the Rust community.

That’s fine and I see there are some canal dedicated for languages other than English. For those who can speak English, it’s fine to be able to find a community willing to help you, and if you don’t, it’s still fine if you can get help.

I am from Germany, but I would never ever consider to use a wenn - dann conditional statement, for example.

Then it’s good for you. But maybe there are people out there who don’t speak English and would like to resolve programming problems without having to add the burden of learning such a complex speaking language as English additionally to the one of learning to program and develop a programming solution.

The only ‘language’ where I have seen this in real life is in Excel formulas and it comes exactly with the problems I have described above and some inconsistencies between English and German version if I remember correctly. :frowning:

Well, that’s why a single specified AST target is important. Without more precise information on this inconsistencies, it’s hard to utter anything really more relevant.

Thank you for sharing you feedback and concern @kunerd

1 Like

@kornel

If the goal is to make the language usable to people who don’t know a word of English, then you’d also need to translate stdlib names and other libraries to match. […] But if you don’t manage to translate all of Rust code that user interacts with, then you’re making it worse for the user:

I agree, just like for any software interface, the fuller the translation, the better. Thus said, you can prioritize string translations by frequency of use, so hopefully most users won’t fall on untranslated strings, and users using more advanced features are more likely good potential translators.

Many of them are abbreviations and domain-specific jargon

Actually, this one could also be changed in some locales, so you could have a “EN-pedantic” locale which translexicalize i32 to thirty_two_bits_integer (I guess).

Also Polish grammar has inflection, so the language gets doubly weird when forced to fit inflexible programming language’s grammar

Well, most programming languages don’t manage English inflections like “person/people” either. Does Rust take that into account? If it does, then surely you should reuse this facility in a Polish localization of Rust. Otherwise the possibility to add this feature at some level on the Rust stack could be discussed, or people might be told that due to some technical decision inflection are not possible in the programming language. By the way, there are programming stack out there like Ruby on Rails which do manage inflections like “person/people”. All that, again, only requires some alias feature.

Now for really more flexibility with rule based inflections and even syntax customization, that would require DSL facilities.

The real barrier is the prose in the documentation that you can’t crack just by learning a dozen of keywords.

I agree that is an important barrier. Although, to my mind, not translating this keywords and not even integrating internationalization facilities send a strong signal that translation are unwelcome.

Translations of documentation and error messages would help immensely.

Did you consider adding the translatable material to translatewiki?

Rustc aleady has macros for the error messages, so this could be extended to have hooks for internationalization.

I’ll be interested by some links on the topic and some development of your idea how to extend that facility. :slight_smile:

Thank you for your insights.

Dear @sebasmagri, @ag_dubs told me to contact you about this topic in a conversation on IRC #rust-lang:

ag_dubs @psychoslave the community team is talking about internationalization you might ping @sebasmagri in #rust-community- we’re mostly focusing on docs and other translation of support materials but this might be an interesting point to bring up to him

Unfortunately, my attempts to join the #rust-community channel all failed, both with my native client and through mibbit.

So I let you consult both this thread and the IRC log of #rust-lang to document about this topic. Let me know if I can help you on any way of if you have any question.

You might need to register an account on IRC to be able to join the channel.


In general while I’m very much for having localized error messages and such in the compiler; I don’t think localizing keywords is something we should support. Other languages use forks for this (https://github.com/ChimeraCoder/koro)

Did you consider adding the translatable material to translatewiki

This kind of requires some thought; because we need to properly decide on how to internationalize things. Pluralization itself can be very painful here, and there are a bunch of other things to deal with. Any internationalization efforts will need some focused effort put into them, not just “put strings up for translation and cobble together an internationalization framework” (that always ends up with problems)

That said, we should try to work on a compiler internals RFC that gets us to that state where we have a good i18n framework within the compiler.

I wish we could bring the discussions here instead of directing people elsewhere. What say?

I have to agree with @kunerd — I think that Rust is already quite complex, we should not add unnecessary complication to it, it would just result in compiler bugs in exchange for a tiny, short-lived convenience for a minute fraction of people. The language being smaller, more uniform, more consistent, and not containing Every Currently Fashionable Minor Feature Ever™ is one primary reason people choose Rust over C++, for example.

The few dozen English keywords are not hard to learn, and quite frankly I would be very upset if I had to read code with non-English keywords. Consider what happens, for example, when you are trying to read the code outside of your IDE, in plaintext, or online in a Git repository. For example, you are on your way, when someone asks you to review some code. Or you need to temporarily work on a machine with only a basic text editor. Now your IDE isn’t at hand to translate the “Rust” code from their native language — which you have no idea how to read! This means that you can’t read and review the code, maybe you’ll waste the other party’s time since they will have to wait, or they’ll commit in the code unreviewed, leading to bugs…

My ultimate point is: we should try to unify communication, that’s how people all over the world can co-operate on projects. And that is done by agreeing on a common language, not by splitting code up into several sub-languages or dialects.

For example, in Haskell, where everyone invents their own clever operators and basically every individual writes in a different domain-specific language, it’s a pain to read others’ code, it’s a disaster. The concept of “translating” a programming language was always puzzling to me, it simply doesn’t seem to make much sense. There’s no real benefit to it, and it would just complicate matters.

As you can probably tell by now, I’m not a native English speaker, but I firmly believe that even with a basic level of English knowledge, it’s perfectly possible to get started with an English-based programming language. I was 9 when I was getting started with coding in BASIC, and it was the very same year that I first learned English in school. So I wasn’t a fluent user at the time — yet I found English keywords the least hard part of learning to program. And if one is a seasoned programmer, the principle applies to them to an even higher degree.

To add to this, I can tell two related personal stories. The first one is: I once had to work on a Linux-based system that somehow ended up with a GCC installation localized to French. The damn compiler was spitting out error messages in French and I have no idea what they meant, it was extremely frustrating.

The other story is: the university I went to (and am currently teaching at) has a strange, in-house educational programming language that uses keywords and type names in Hungarian, which is our native language. This is supposedly done so that “students find it easier to get started”. But students hate it passionately. They always complain about it, and there’s nothing I can do because it’s faculty policy to use that language in the first semester of Intro to Programming.

16 Likes

Nice to discover Koro, thank you for the link. It’s good to have the fork option, that one of the great advantage of libre software. But if it can be avoided, it’s even better, as forks come with their own disadvantages. I don’t think I need to

If possible, it would be far more resource efficient and user convenient to put some facilities right into the official toolchain and have a locales/ directory, so that adding a new Rust locale is basically providing a set of tranlated files. I’m fully aware that mere string literal translation is not enough for a complete internationalization, I even wikified a course on internationalization (in Esperanto). So I’m not completely ignorant on this topic, and would the Rust community be interested, I would be happy to help as much as I can on this point.

Well yeah. But at the same time a user shouldn’t get stuck because he can’t understand what the error means.

Are you saying that a compiler error message doesn’t/shouldn’t convey necessary information? That’s some very strange thinking. Anyway I didn’t write “I got stuck” — I wrote that it was frustrating, which it was. I did eventually figure it out somehow, but why would we want to do this in a new language? It’s just bad.

Rust is a language obsessed with programmer ergonomics, a.k.a. “doing things the right way this time”, since that is a big part of making less mistakes and writing safe, correct code. We shouldn’t purposefully open up opportunities for confusion.

You should be able to use “LANG=C” before any comment to fall back to the default system language, which is generally equivalent to “LANG=EN-US”. :slight_smile:

It’s fine that you prefer to use English lexicon in your code, and this discussion is not about removing such an option. Actually I think that integrating feature for Rust internationalization would avoid scattering of effort in projects like the in-house educational language. Plus it would make transposition of concepts to an English variant even more straight forward, since it would basically be the same language with an other thin layer of lexical-sugar. So it would let users who prefer to use an other work speaking-language take this option, while letting a neat bridge to move to English (or whatever target language).

The problem of convention illustrated with Haskell is only weakly related to this internationalization proposal. In fact, although I’m not knowledgeable of the Haskell community, I guess all the DSL mentioned will equally use English in their development, don’t they? So actually, it could perfectly be argued that including official locales will reduce this kind of proliferation with a clearly given conventional way to develop localized code.

I’m sad to hear that you are full of hate for diversity, and if this is the general opinion of the Rust community, I won’t fight this hostility against freedom of people to express themselves with the lexical inventory they wish.

In the #rust-community IRC channel logs I saw that @geraldobarros, @KiChjang and @booyaa might also be interested by this topic.

Cheers

I think you misunderstood my point. I never said we need to convey less information. I said they should be an “option” for a user - either in a core or even in a separate crate - that allows him to read the error messages in a language he prefers.

Well yeah. But at the same time a user shouldn’t get stuck because he can’t understand what the error means.

To be clear, this discussion is conflating two separate concerns.

There’s “we should have Rust the language be internationalized, with internationalized keywords and such”. IMO that’s of questionable benefit, and really should be done in a fork. There are some really good reasons why programming should be possible in more languages, and some languages (notably Excel) have done this, but I don’t think it’s something we should be trying as part of the core project.

Then there’s “should the output of Rust the compiler be internationalized” to which the answer has basically always been “yes, but we need to do this carefully”

8 Likes

Thank you @Manishearth for the clarification. The current thread is only about the first of this two topic, and discussion about the second should be posted elsewhere.

1 Like

I think the discussion about making Rust use internationalized keywords is completely off the mark, and I agree that it is not worthy of implementation.

What I do strongly advocate for is the translation of error messages to different languages, and this is actually a problem – there is quite a lot of people in the Rust China community that post similar questions over and over again, simply due to the fact that they do not understand the error message itself, even for basic things like method is not in scope or something similar to that.

I’ve heard that such an initiative is actually not easy, and requires an RFC in order to design this correctly such that the compiler supports i18n natively.

7 Likes

It’s fine that you prefer to use English lexicon in your code, and this discussion is not about removing such an option

I do realize, but if the addition of non-English keywords is introduced into the language, some programmers and projects will inevitably start using it, and thus many of us who expect Rust programs to be written in English will be out of luck without special tooling support.

I’m sad to hear that you are full of hate for diversity

I’m not full of hate for diversity – it’s simply that I think localizing a programming language is in itself a bad idea, for the several technical reasons that I enumerated and backed up with arguments. And if one wants IDE supported programming in their own native language, that could be done by the IDE only, without changing the language itself. Why not keep the code base in English, and then whoever prefers non-English keywords can use their own IDE to do the translation back and forth on the fly.

For instance, I’ve seen JavaScript programmers configure their editor to replace the long function keyword with the shorter f for better readability. That’s the kind of manner I would imagine non-English programming to be implemented in.

hostility against freedom of people to express themselves with the lexical inventory they wish

I feel that this is… quite an exaggeration.

I guess all the DSL mentioned will equally use English in their development, don’t they? So actually, it could perfectly be argued that including official locales will reduce this kind of proliferation with a clearly given conventional way to develop localized code.

I don’t think so, we might be referring to different situations. In Haskell, it’s customary to define new operators. Thus, function calls of descriptive names often get replaced by punctuation that one programmer pulled out of thin air, thus greatly decreasing consensus, and, consequently, readability. (I can’t tell if these operators count as “English” or any particular language. They are usually just arbitrary combinations of mathematical sigils.) It does not help that many Haskellers also enjoy using Greek letters as function names — but those who don’t, find this practice majorly irritating.

3 Likes

I am not against adding localized error messages at all. As long as they are opt-in and do not depend on magic undocumented environment variables with which the compiler tries to second guess me, localization of the compiler is fine and desirable, as with every other application.

Some people enjoy reading error messages in their own mother tongue, so we should definitely give them the opportunity — exactly because compiler errors are in a natural language, and they do not take part in the code itself, so translating them makes sense, as they do not need to be uniform. Once the code compiles, they are gone anyway, and they only need to be understood locally, by the single programmer who writes and debugs the code at that very moment in time.

(Still, I would personally prefer English error messages to my native Hungarian even if the latter existed. Basically everything programming-related feels and reads more natural in English to me, and the Hungarian IT jargon in general is severely limited in expressivity, and sometimes very strange, to put it politely.)

6 Likes

Oh, that’s right. I thought you wrote this in a sense that “a user should not get stuck even if s/he doesn’t understand the error message”, but I see that this was not the case.

I’m afraid I don’t understand your concern. Obviously you introduce features so people can use them. I don’t see however how it might hurt anyone not interested with a code which is not written in English.

I’m not full of hate for diversity – it’s simply that I think localizing a programming language is in itself a bad idea, for the several technical reasons that I enumerated and backed up with arguments.

Sorry then, I misinterpreted you sentence “But students hate it passionately”, emphasize is your. I’m open to debate technical pros and cons, but I’m not interested to argue with per-established point-of-view based on so strong emotional factors which are unlikely to change through rational specifications.

Why not keep the code base in English, and then whoever prefers non-English keywords can use their own IDE to do the translation back and forth on the fly.

Because, as it was already explained

  1. not everyone want to have to launch an IDE to get such a feature
  2. more important, that add a transpilation layer which get in the way of debugging session.

Thus having internationalization feature included in the compiler stack would be interesting.

What if someone is (or several people are) interested in a piece of code but they doesn’t understand it because it’s not in English? Then it’s possible that a larger community of potentially-interested developers get excluded by lack of a common language, just because a smaller community decided to write it in their own native language.

By “students hate it passionately” I was referring to their obligation to program in such a programming language. I’m sure they don’t hate other nations’ native languages…

but I’m not interested to argue with per-established point-of-view based on so strong emotional factors which are unlikely to change through rational specifications.

My arguments are purely technical. I argued that splitting a language into many sub-languages would hurt the community because it would make cooperation harder.

not everyone want to have to launch an IDE to get such a feature

But following that thinking, not everyone would want to launch an IDE to have non-English code translated back to English. So one could make that argument for either side.

2 Likes

It doesn’t have to be concrete. It can be a “virtual thing” that translates everything say when you run a
command.

1 Like