It seems to me that there are five real issues here:
- Internationalization of Rust's defined keywords, attribute names, etc.
- Internationalization of identifiers
- Internationalization of error messages
- Internationalization of comments and other documentation
- Internationalization of numbers, dates, times, etc., both in compiler input and output and within the runtime
Others in this thread have made the argument that 1) would make open-source programming much more difficult when learning to program, when interacting with others, and when attempting to reuse the work of others. I personally agree with all those concerns. It would also complicate the use of macros, which usually define symbols in the macro-programmer's language. The hundreds of macros within the compiler that define what most programmers consider to be Rust's built-in aspects would be particularly problematic.
-
should be a fairly straightforward change to the compiler. In this the Go language shows the way. See The Go Programming Language Specification - The Go Programming Language and the following "Letters and digits" and "Identifiers" sections.
-
is a significant challenge, due to the number of different error messages in the compiler, the language-dependent order of the variable arguments that appear in the error messages, and the language-specific aspects such as gender, number, inflection, etc. that some languages require. The current nightly build contains 405 distinct format!( ) macros in 103 files that appear to report errors, most of which would have to be generalized to invoke an internationalization module that would in turn be able to load templates for any supported alternative language. Such translation also must address any required reordering of the variable content in each message (e.g., the three parameters of an English-language message might need to occur in a different order in the second language).
It should be pointed out that the biggest challenge is not in the initial translation, but in the maintenance across compiler changes, A significant pool of language translators would need to be retained to address new or changed error messages for each stable Rust build, with that effort appearing first in the many nightly builds.
-
Presumably comments can already be written in a language of the programmer's choice. Such use is probably sufficient for non-Markdown comments. For Markdown comments, a means of invoking replacement strings or comment blocks found in a language-specific internationalization module (or section) might be appropriate. As is always the case, the burden here would fall on those individuals providing the translation to a given target language.
5a) The decimal point '.' in non-integer numbers is somewhat problematic, since most of the world uses a comma ',' to separate integral and fractional parts of fixed-point and floating-point numbers. Those of us who read international standards are used to seeing both forms of separator. Both period and comma can be used as a fraction separator in numbers provided that the separator character is surrounded by digits in all such uses, and that use of either character elsewhere in the syntax is required to always avoid separating an integer and an adjacent digit (e.g., by inclusion of whitespace). Because a macro using one separator can be invoked by someone using the other separator, it seems probable that both should be treated equivalently.
5b) Many languages have non-Arabic characters to represent the digits of numbers, sometimes in a radix other than radix 10. Such support does not seem essential for the internationalization of Rust. Restricting Rust to use only the Arabic numeral characters 0
to 9
avoids the often-encountered problem in other languages of the same character being used in numbers and in other words. For example the Chinese numeral ’一‘ (Pinyin yĪ), meaning one, occurs as the initial character in many Hanzi words, as well as in numbers, making disambiguation of numbers from identifiers potentially difficult.
As a point of reference, although the Go language permits use of non-Arabic digits in identifiers, it does not permit them in numbers.
5c) Other internationalization requirements are similar to those in various operating systems and generally should use APIs of the host or target system in determining their behavior.