The internationalization of Rust itself


#21

Well yeah. But at the same time a user shouldn’t get stuck because he can’t understand what the error means.

To be clear, this discussion is conflating two separate concerns.

There’s “we should have Rust the language be internationalized, with internationalized keywords and such”. IMO that’s of questionable benefit, and really should be done in a fork. There are some really good reasons why programming should be possible in more languages, and some languages (notably Excel) have done this, but I don’t think it’s something we should be trying as part of the core project.

Then there’s “should the output of Rust the compiler be internationalized” to which the answer has basically always been “yes, but we need to do this carefully”


#22

Thank you @Manishearth for the clarification. The current thread is only about the first of this two topic, and discussion about the second should be posted elsewhere.


#23

I think the discussion about making Rust use internationalized keywords is completely off the mark, and I agree that it is not worthy of implementation.

What I do strongly advocate for is the translation of error messages to different languages, and this is actually a problem – there is quite a lot of people in the Rust China community that post similar questions over and over again, simply due to the fact that they do not understand the error message itself, even for basic things like method is not in scope or something similar to that.

I’ve heard that such an initiative is actually not easy, and requires an RFC in order to design this correctly such that the compiler supports i18n natively.


#24

It’s fine that you prefer to use English lexicon in your code, and this discussion is not about removing such an option

I do realize, but if the addition of non-English keywords is introduced into the language, some programmers and projects will inevitably start using it, and thus many of us who expect Rust programs to be written in English will be out of luck without special tooling support.

I’m sad to hear that you are full of hate for diversity

I’m not full of hate for diversity – it’s simply that I think localizing a programming language is in itself a bad idea, for the several technical reasons that I enumerated and backed up with arguments. And if one wants IDE supported programming in their own native language, that could be done by the IDE only, without changing the language itself. Why not keep the code base in English, and then whoever prefers non-English keywords can use their own IDE to do the translation back and forth on the fly.

For instance, I’ve seen JavaScript programmers configure their editor to replace the long function keyword with the shorter f for better readability. That’s the kind of manner I would imagine non-English programming to be implemented in.

hostility against freedom of people to express themselves with the lexical inventory they wish

I feel that this is… quite an exaggeration.

I guess all the DSL mentioned will equally use English in their development, don’t they? So actually, it could perfectly be argued that including official locales will reduce this kind of proliferation with a clearly given conventional way to develop localized code.

I don’t think so, we might be referring to different situations. In Haskell, it’s customary to define new operators. Thus, function calls of descriptive names often get replaced by punctuation that one programmer pulled out of thin air, thus greatly decreasing consensus, and, consequently, readability. (I can’t tell if these operators count as “English” or any particular language. They are usually just arbitrary combinations of mathematical sigils.) It does not help that many Haskellers also enjoy using Greek letters as function names — but those who don’t, find this practice majorly irritating.


#25

I am not against adding localized error messages at all. As long as they are opt-in and do not depend on magic undocumented environment variables with which the compiler tries to second guess me, localization of the compiler is fine and desirable, as with every other application.

Some people enjoy reading error messages in their own mother tongue, so we should definitely give them the opportunity — exactly because compiler errors are in a natural language, and they do not take part in the code itself, so translating them makes sense, as they do not need to be uniform. Once the code compiles, they are gone anyway, and they only need to be understood locally, by the single programmer who writes and debugs the code at that very moment in time.

(Still, I would personally prefer English error messages to my native Hungarian even if the latter existed. Basically everything programming-related feels and reads more natural in English to me, and the Hungarian IT jargon in general is severely limited in expressivity, and sometimes very strange, to put it politely.)


#26

Oh, that’s right. I thought you wrote this in a sense that “a user should not get stuck even if s/he doesn’t understand the error message”, but I see that this was not the case.


#27

I’m afraid I don’t understand your concern. Obviously you introduce features so people can use them. I don’t see however how it might hurt anyone not interested with a code which is not written in English.

I’m not full of hate for diversity – it’s simply that I think localizing a programming language is in itself a bad idea, for the several technical reasons that I enumerated and backed up with arguments.

Sorry then, I misinterpreted you sentence “But students hate it passionately”, emphasize is your. I’m open to debate technical pros and cons, but I’m not interested to argue with per-established point-of-view based on so strong emotional factors which are unlikely to change through rational specifications.

Why not keep the code base in English, and then whoever prefers non-English keywords can use their own IDE to do the translation back and forth on the fly.

Because, as it was already explained

  1. not everyone want to have to launch an IDE to get such a feature
  2. more important, that add a transpilation layer which get in the way of debugging session.

Thus having internationalization feature included in the compiler stack would be interesting.


#28

What if someone is (or several people are) interested in a piece of code but they doesn’t understand it because it’s not in English? Then it’s possible that a larger community of potentially-interested developers get excluded by lack of a common language, just because a smaller community decided to write it in their own native language.

By “students hate it passionately” I was referring to their obligation to program in such a programming language. I’m sure they don’t hate other nations’ native languages…

but I’m not interested to argue with per-established point-of-view based on so strong emotional factors which are unlikely to change through rational specifications.

My arguments are purely technical. I argued that splitting a language into many sub-languages would hurt the community because it would make cooperation harder.

not everyone want to have to launch an IDE to get such a feature

But following that thinking, not everyone would want to launch an IDE to have non-English code translated back to English. So one could make that argument for either side.


#29

It doesn’t have to be concrete. It can be a “virtual thing” that translates everything say when you run a
command.


#30

The LANG environment variable is documented in POSIX and its impact of the on gcc is documented in GCC documentation. And this is opt-in.


#31

Yeah, that’s fine, I just wanted to make it clear. (I don’t know how that particular Linux distro or GCC toolchain installation ended up setting stuff to French while the GUI was all English — I sure didn’t ask for it.)


#32

What if someone is (or several people are) interested in a piece of code but they doesn’t understand it because it’s not in English?

What if some people are interested in a piece of code but they don’t understand it because it is in English? That’s just as a valid argument.

All the more, chances are that a piece of code written using native lexicon will pertain to some locale issue (say, like a management system heavily tied to some locale laws), or an issue pertaining to a language other than English like a grammatical corrector.

With a proper integration of some internationalization facilities, you can encourage people to create libraries which have API for both the concerned community language and others languages at the same level.

On the other, putting that in an external afterthought workaround solution like a fork will most likely make code written on top of such a solution far more entitled with the derivative compiler tool chain, and less likely offer internationalized API. And then, you’ll have the piece of code that English speaker might want to understand, but will have far harder time to make anything else than learning the used language or recreate a whole new project.

My ethos is that small communities deserve my help just as larger one, as to me they are equal in dignity. And, I think that it is important to make everything we can to diminish the rapid shrink in language diversity that is currently happening in the World. All the more when studies show a link with degradation of biodiversity. So wherever we can, we should meditate the impact of our choices regarding the llinguistic diversity and multilingualism on Internet.

All the more, it’s really not a question of community, otherwise we should all use Chinese ideograms, shouldn’t we? :laughing: All the more it’s ideograms, you already have written understanding over very different spoken languages in China. Plus ideograms are so more compact. :dart:

By “students hate it passionately” I was referring to their obligation to program in such a programming language. I’m sure they don’t hate other nations’ native languages…

Then, once again, internationalization facilities would allow them to code in whatever language they want and provide the translexicalized version to their torturer teachers. :wink:

I argued that splitting a language into many sub-languages would hurt the community because it would make cooperation harder.

I don’t agree. Those who can contribute in English will still be able, and those who can’t would benefit from internationalization facilities and would be able to work in a consistent programming environment, not suffering of the drawback that can raise in a fork, while letting their work being far more easier to switch to an other lexical inventory.

But following that thinking, not everyone would want to launch an IDE to have non-English code translated back to English. So one could make that argument for either side.

The point was that this translexicalisation layer should be part of the compilation tool chain, rather than isolated in an IDE where it would rise problems, especially in debugging, that can’t be resolved if this is not thought upstream in the development tool chain.

Kind regards


#33

The point was that this translexicalisation layer should be part of the compilation tool chain, rather than isolated in an IDE where it would rise problems, especially in debugging, that can’t be resolved if this is not thought upstream in the development tool chain.

Where would this translation layer work?

If it modifies the source, it would cause major merge conflicts whenever two people with different prefered languages work with the same source.

I can also see two minor technical issues that would make such a change harder.

Varying keyword length would mess with allignment of source blocks, and it would require reserving a lot more keywords than are currently reserved. Instead of if being reserved, all possible translations of if would also need to be reserved.

I feel like a change like this would require a lot of work to get right in order to translate ~20 keywords which are consistent with other languages anyway. A much better use of that time would be to, as others have suggested, translate error messages and things like documentation of libraries and tutorials.


#34

Where would this translation layer work?

I’m not sure to understand your demand, would you please be kind enough to precise your question?

If it modifies the source, it would cause major merge conflicts whenever two people with different prefered languages work with the same source.

It would change the source file directly, unless explicitly demanded. For example you might have something like LANG=C cargo fmt, LANG=AB cargo fmt or LANG= ZH-Hant cargo fmt, or something more explicit in the command argument like cargo fmt --target-lexicon=HU. But essentially, you would use this kind of translexicalisation in rare cases, as source files of a single project would probably all use a single lexicon.

Varying keyword length would mess with allignment of source blocks, and it would require reserving a lot more keywords than are currently reserved. Instead of if being reserved, all possible translations of if would also need to be reserved.

The reserved keyword should of course vary along the specified lexicon used. For example Babylscript use things like ---fr--- to indicate that the following code use the French variant, so you can use if (yew) to name a variable pertaining to a tree (how useful! :laughing:) in this context, while the code in ---en--- namespace is still your good old conditional statement controller. Not that is should be the approach kept, but it’s a proof of concept that this concern of reserved word explosion doesn’t occur in practice.

I feel like a change like this would require a lot of work to get right in order to translate ~20 keywords which are consistent with other languages anyway.

It all depends the architecture of the compiler tool chain. I would in contrary expect this to require a rather set of changes in the lexer, otherwise it might be a signal that the architecture is really missing modularity.


#35

Your second paragraph answered that question for me

But then we run into the issue where I couldnt, for example, contribute to a library written in french rust.

Then what happens if I reformat some code written in english rust into swedish rust when the english code uses a variable called om? That would now be a reserved keyword for if and the code wouldnt compile. I guess that isn’t a problem if this is only supposed to be run rarely. But I feel like that would create a bunch of other issues, like limiting the amount of people that can contribute to these ‘local’ libraries.


#36

Thanks for the mention @psychoslave.

The internationalization of the compiler error messages, as well as providing good foundations in the crate ecosystem to achieve localization easily in Rust programs, are definitely goals I’m into. This topics have been in the discussion tables for some time, and we’ve basically been gathering as much feedback as possible so this thread is actually a pretty good source of ideas.

LANG / LC_* based picking of messages in runtime (of the compiler) is definitely an option we have had in mind. Also, to facilitate linking back to the documentation or searching for compiler errors in the Web, the introduction of error codes/language agnostic identifiers in std and the compiler would be useful.

This is something that can we need to do after gathering as much information as we can, mainly from non English speakers and people around the globe, so I’d really like to invite you all to participate in this discussion issue so we can feed it with the ideas exposed here and consider them later as part of the RFC process.

Thanks!


#37

But then we run into the issue where I couldnt, for example, contribute to a library written in french rust.

For someone who don’t speak French, yes obviously. But then again, the obvious alternative would not that the equivalent would exist in an English Rust library, but most likely that it would be coded in a totally unrelated French programming language (there are some of them with commercial support and all), or possibly that it wouldn’t exits at all.

Also you might expect that such a library would be heavily related to some French-language topic or to a a topic related to a French-speaking country. So if the topic is of interest to you, then you most likely would like to learn French anyway.

Then what happens if I reformat some code written in english rust into swedish rust when the english code uses a variable called om? That would now be a reserved keyword for if and the code wouldnt compile. I guess that isn’t a problem if this is only supposed to be run rarely. But I feel like that would create a bunch of other issues, like limiting the amount of people that can contribute to these ‘local’ libraries.

One possibility would be to have an option to toggle translation of identifiers, possibly with more or less verbose interactive selection per lexeme and an option to indicate some source of lexical matchings.

But I feel like that would create a bunch of other issues, like limiting the amount of people that can contribute to these ‘local’ libraries.

Whatever you do, there will always be a limited amount of people able to contribute. Requiring English to code anything will limit the number of people able to use the language. Making it the recommended convention for library targeting an English speaking audience would at least make other uses possible.


#38

For those reading this thread, you might be interested with
GopherCon 2017: Aditya Mukerjee - Translating Go to Other (Human) Languages, and Back Again - YouTube

I also discovered that Perl 6 slangs open large flexibility regarding what is parsable to feed the underlying interpreter, all through native facilities as far as I can judge from reading the doc.


#39

I think there’s an important thing to note about the argument “There will always be a limited amount of people being able to contribute”:

There’s a big difference between “everyone needs to learn an additional language so that they can communicate with each other” and “everyone writes stuff in their language of choice and needs to learn whatever language others use when reading their code”. I had to learn English, but now I’m able to use almost all online libraries, programming languages etc. because English is the common language in those. If I’d need to use five libraries, all written in different languages, I wouldn’t have a chance to make my program work.

For me, on of the most important aspects of open source projects is global collaboration and knowledge sharing. I believe that this is only possible because there’s a common language used in almost all projects. When allowing people to write their code in different languages, I’d be afraid that this would lead to more local projects and less global communication. For example, just imagine a German, English and Chinese community, each working on their own web framework. Apart from a lot of work that would have to be done multiple times, this would also mean that changing from one framework to the other would require you to learn a new language.

For the reasons above, I’d even say that internationalization of project development hurts diversity and inclusion rather than improving it.

Apart from that, I’m totally with you in trying to lower the barriers for non-English speakers, using translated Error messages and documentation, local Q&A sites, etc.


#40

There’s a big difference between “everyone needs to learn an additional language so that they can communicate with each other” and “everyone writes stuff in their language of choice and needs to learn whatever language others use when reading their code”.

The main difference is probably that in the first case it enforces the burden of learning a new speaking language, for a benefit that is mostly hypothetical. How many project out there are solely internal and don’t take any benefit from underlying software layers being extremely tied to English? How many open source projects have a single or small same-native group of contributors, with English only coming as an unnecessary additional burden in their work?

The difficulty of learning a new language comes with many variables, including how many languages you already know, how close your linguistic knowledge is with the targeted language, how much time you can dedicate to this activity and spend in full immersive environment.

I had to learn English, but now I’m able to use almost all online libraries, programming languages etc. because English is the common language in those.

When someone can master a programming language or a new library in no time, most likely it means that is just an other implementation of the same conceptual stuffs that this person already masters with just a very thin layer of different syntactic sugar.

If I’d need to use five libraries, all written in different languages, I wouldn’t have a chance to make my program work.

But no one is suggesting that it’s what would happen with the integration of internationalization of cleverly designed facilities. On the contrary, it should ease the move of an existing library gaining in popularity to whichever lexicon turn out right for most people interested in contributing, while still allowing anyone to bootstrap with whichever lexicon they prefer. Chances are good that most popular libraries would still use English, for the forthcoming time.

For me, on of the most important aspects of open source projects is global collaboration and knowledge sharing. I believe that this is only possible because there’s a common language used in almost all projects.

The underlying important aspect is to empower end users, for that is how you enable people to collaborate at whichever extensive level. Programming languages are just one kind of interface between the end user and the state machine which should help them have a better life. The common language people are using for global digital collaboration are machine instructions, and anything else is more or less abstract interface sugar.

When allowing people to write their code in different languages, I’d be afraid that this would lead to more local projects and less global communication.

Aren’t this fears groundless? No one should let groundless fears guide them against respectful human diversities. :sun_with_face:

For example, just imagine a German, English and Chinese community, each working on their own web framework. Apart from a lot of work that would have to be done multiple times, this would also mean that changing from one framework to the other would require you to learn a new language.

Once again, this is missing the main point that it’s not about preventing anyone to use any commonly agreed language for projects targeting international audiences and contributions. This phobia of multiple implementation is somewhat baffling. First, at last news in the wild there are many web frameworks. Even on top of a single programming language you often found several rather large ones.

It could expect that having some of them using whichever speaking language has primary working language as a potential competitive advantage for some niche markets. All the more, diversity comes with some advantages, like better resilience of the network against failure which increase with system homogeneity.

Finally what it really means would that anyone would chose the software framework which seems the more adapted to their need, and if “use English as working language” is in the requirement, any solution which doesn’t meet this specification will be dropped. That’s all it means, as far as I can tell. :slight_smile: