Why Rust does not have any localization in system libraries?


#1

Hello,

I very exited by Rust, but how to write commercial programs on it, when it does not have localization and i18n support?

I18n not only for strings like l20n, but for date, time, numbers, currency, images and other resources. Also locale (or culture) should be local for every thread in process. For example dotNet have culture and uiсulture for each thread. And this is very conveniently.

I do not see that Rust fmt can be used in commercial programs. I18n can not be statically compiled. It should be dynamic so fmt is useless in commercial programs, except some rare cases.

All this makes Rust only for research, but not for commercial applications. Why do not include some standard i18n on l20n base in rust and use it in system and other libraries.

Only l20n should be extended to dates, times and currencies and so on.


#2

I think this issue and this crate are relevant.

In general, Rust tries not to rush ahead with including stuff into std. It’s too easy to standardize some suboptimal API this way and suffer from legacy forever after (Java’s date and time API is arguably an example of this).

So Rust is not batteries included yet, but this is relieved by the great and growing https://crates.io/ infrastructure.


#3

As always, that depends on what kind of commercial application. Many, many, many commercial applications only support one language. Yes, in general it would be nice if more were localized, but the reality is very different.

And @matklad is right on the money. We don’t have enough experience with these APIs yet to move something into std.


#4

This is pretty snarky. I’m writing commercial applications for a living and never touch i18n (which is because I’m doing backend and data mangling). I would even say that a huge array of commercial applications has absolutely no need for this.

But: it’s a considerable gap in the ecosystem. I don’t thing the stdlib should close it, though. Libraries outside of stdlib are much easier to maintain, use and to build alternatives for. Rust is still a young language though, so these kinds of gaps need to be expected.


#5

I think it is a bad idea if low-level system function change their behavior in funny ways, based on some global user setting you may not even be aware of.

For an example, a lot of software has problems with the turkish locale, because in Turkey there are i and ı, with respective upper case letters İ and I. So in a turkish locale the upper case letter for i is not I, violating an assumption a lot of software makes.

Another example closer to home is comma-separated lists of numbers, which fails in a German locale, because the decimal separator is a comma.

So I would like any locale related processing to be explicit, with the locale as a parameter. That could easily be done in a crate separate from std.


#6

This is just tip of the localization nightmare.

What if you want to search strings for letter ч, but people want to be able to search for ч without switching their keyboard, so they type c or ch?

How do you support mixed content where some parts are written left to right and some right to left?


#7

I think that functions that dependent from locale should have short form (without locale parameter and use current locale) and full form (locale as parameter). So it will be convenient to use them.

All function that independent from locale should call locale depended functions with invariant locale as parameter. So you can parse comma-separated lists of numbers by using invariant locale.


#8

What if you want to search strings for letter ч, but people want to be able to search for ч without switching their keyboard, so they type c or ch?

I don’t understand exactly what your question about.
By default search functions can not find ч for ch for any locale. But if program should find such ч for ch then you should write such function and use it.

I think all locale depended function of standard library (like regexp functions, number to string, date to string and so on) should support all locales, but not only English. Also It should be possible to translate error messages of standard library.


#9

My point is that as it is today for example in in C localization can break even software which doesn’t need and doesn’t want to use it, for example because computers are talking to other computers and not to humans (at least not to end users, IT people should be able to cope without localization).


#10

It should somehow be obvious that you are using locales, either by putting them in a separate namespace or by prefixing them say with loc_.[quote=“AlexRadch, post:7, topic:4745”]
All function that independent from locale should call locale depended functions with invariant locale as parameter. So you can parse comma-separated lists of numbers by using invariant locale.
[/quote]
IMHO they should do their job in the most simple and sane way, independently of locales. For example, sorting using the full Unicode Collation Algorithm is very inefficient, so you could better use sorting by code points.

Also, there is no single locale that is really a good choice, e.g. most have non-ISO date formats.

And it should be easy to remove locale processing for systems that don’t need it, like small embedded systems.


#11

I disagree there. Depending on the environment is an implicit input parameter. Libraries should not provide that (applications can still shim that, if they want).


#12

Well, then your UX suffers and every programmer needs to implement their own version of a collator for a locale. What is point of localization if it doesn’t take these tiny idiosyncrasies into account?


#13

That should be in some Unicode, localization or UI library, not in the standard library.


#14

@starblue I think there has been a misunderstanding. I also advocate for making localization into a separate library. Localization is IMHO a per language thing. You can’t even make a decent one that satisfies most or most used languages.

My point is, a great localization library will take such tiny things into considerations. It’s not part of the Unicode and searching text for such an important things shouldn’t be relegated to a UI library (imagine using Qt because you need DateTime processing).


#15

My point is that as it is today for example in in C localization can
break even software which doesn’t need and doesn’t want to use it, for
example because computers are talking to other computers and not to
humans (at least not to end users, IT people should be able to cope
without localization).

If you don’t need localization you simple can use invariant locale.


#16

That should be in some Unicode, localization or UI library, not in the standard library.

Why standard library and most of Rust libraries should always return error messages on English?
Why I can not localize any such library. It should be possible.


#17

My point is, a great localization library will take such tiny things into considerations. It’s not part of the Unicode and searching text for such an important things shouldn’t be relegated to a UI library (imagine using Qt because you need DateTime processing).

Localization do two things:

  1. It dynamically return localized resources like strings, images and so on.
  2. It process locale depended data by locale depended rules or algorithms. Like sort strings? convert to upper case? format numbers, date and so on.

First part can be implemented in one small library.
The second part can not be separated into in-depended library. For example, standard date time library should support date time formatting for different locales I think.


#18

… yet.

Rust 1.0 is not a year old yet. I’m confident that it’ll come eventually.


#19

We tried to get it in early on when designing the format-string system, but a full implementation was cut due to time/energy constraints. Instead we opted for a format syntax that would at least be a subset of ICU/Java MessageFormat syntax, so we would be able to expand to a proper L10N system without breaking backward compatibility, when there was available time/effort.


#20

IMHO a programm should not return the messages included in errors to the user (only to the programmer when debuging). I think that returning a error is something witch should always done by some kind of UI, even if it is only a CUI/Commandline Interface. So localizing error messages and similar (in std/libs) should not be needed and can be done by the UI library presenting the messages to the user.

Also I already had some bad experiences with some Java libraries witch didn’t used any localisation but broke on non english localisation because the programmers weren’t aware that some functions use implicitly the local Localization (witch I think is a (very) bad idea).

BUT I also understand why I/people would like to have Localization in the standard library. Localization is something you have to consider from the beginning on when programming some software, having multiple external libraries for it witch might end up being incompatible to each other would be really bad (limiting on how you could combine different creates).

I think it would be the best if there is some “official” localization library(s), existing independent from std, witch provide the basic localization mechanism. Where I would prefer std to be non-localizing (!= using current/default local) this lib could provide localized versions of some of the non-localizing functions provided by std.

On the other side the rust ecosystem had been quit good at coming up with external solutions for problems witch normally are solved by the language/std-lib (e.g. this in stdx). So it might not be needed at all :smile: