Are there any plans for locales?


#1

I was looking for localization options and noticed that not only there is no localization functionality yet, there does not even seem to be any way to get to the basic localization data like decimal and thousands separator, date and time format, currency symbol and such.

So first I’d like to ask whether there is any plans for it.

I wrote a small test to call setlocale() from standard C library and checked that it does not have effect on output of format!. Which I consider to be good, because that way it can be used for writing machine-readable data while it is not really sufficient for user output where more parameters are needed.

If there is indeed no plan yet, I would like to try to come up with some prototype over next couple of days/weeks. I have managed internationalization and localization in both previous and this job and wrote some infrastructure (in C++) in both cases, so I would create a flexible formatting API based on the needs I encountered (like formatting dimensional quantities with minimum and maximum precision and switching between units and such).

I also already checked the available C interface and the standardized part of libc is really poor, because locale can only be set globally and must be set globally to be able to get to the information. In GNU libc there are functions to set it per-thread, so safe implementation is possible there. And Windows have native functions that take locale parameter, but they don’t use the standard locale identifiers like everybody else, so it will be somewhat difficult. Or I could collect the data from CLDR instead, which would mean largeish blob of data (directly or via ICU), but it could have data the standard C library does not and it would run on systems that don’t ship those data like Windows Embedded (formerly CE) or Android (Android has locale data in Java only). Perhaps as optional feature.

I suppose this can be easily prototyped as stand-alone library, so that’s how I’ll start.

Oh, and I should note that I have seen the suggestion to implement l20n-based translation system, but that does not touch locales and locale-aware value formatting, only translation. And to be honest I think it’s too advanced for it’s own good.


#2

has background, but basically, there aren’t any plans to add it to Rust itself. A library would be great though!


#3

Yes, I’ve seen that, but I didn’t notice anything about formatting there, so that’s why I started with that.

Besides while the l20n allows some nice flexibility, many projects (like I do at day job) are translated by non-technical people (either agency translators or people at some partner companies, but they are usually not much technical anyway) and I don’t think they would be able to use the flexibility. So I’d like to see a simpler gettext-based solution too anyway.


#4

I, too, have been looking for i18n support for Rust, and it seems there hasn’t been much progress on it since this was posted - or has there been ? :wink:

For us, lack of i18n support is an absolute showstopper for using Rust. And since l20n etc. have been thrown out there: gettext type support is hugely important too, mostly for the interaction with translators. We very much want to have them use the same tools/interfaces everywhere (not just for Rust projects), and pot/po files are the really important part of gettext for this.

If there are active Rust projects around i18n out there, I’d love to hear about them/get involved.


#5

I was doing some prototyping, but didn’t get much time last several months to finish it. I hope to move it a bit further soon. Basically we got stuck on designing where to get the locale data—because bundling CLDR produces huge package, but what is available in various systems is widely inconsistent and sometimes it is hard to get the data at all (specifically, Android only has them at the Java level, but not the native level).

If you want to help, I can add you to the https://github.com/rust-locale project.


#6

I made some library which uses (partial) gettext syntax for a project of mine, it is available here https://crates.io/crates/crowbook-intl but really quite hacky (though it somewhat works for my project). I also wrote some article to present it (might be a bit outdated though): http://lise-henry.github.io/articles/localization.html Don’t know if that can really be useful to other people at this time but well ^^

I think it’s a domain where Rust is quite lacking. E.g., I don’t need to do really fancy stuff with number formatting, but I don’t even know a way to access the user’s preferred language (that works both on Unixes and Windows).


#7

Yes, access to user’s preferred language is actually the majority of work in progress lying over at next branch of rust-locale, but even then it is only implemented for Unix. I know how to do it on Windows, but didn’t get around to it yet.

On the other fronts, there is already a gettext crate and a binding for GNU libintl and at least two string formatting libraries, simple strfmt and Java/ICU-compatible message-format. I would prefer building on these over making yet another implementation of anything.

Though of the gettext I am not sure whether it’s the right way, because:

  1. I haven’t seen good way for deploying resources with cargo install and
  2. the gettext crate does not use mmap (while it is not needed to work, the .mo format is specifically designed to work well with mmap and it is a pity not to take advantage of it).

So I am actually considering designing something that would link the translations directly into the binary, with added benefit that the msgids would not need to be repeated in each. And using it for distributing the common data too, because the lack of installation support is even more serious there.


#8

WOW! I’ve read your blog post now and that approach sounds really cool. I was always thinking in terms of fetching from (possibly bundled) hash at runtime, which gets various problems like combining the hashes from various libraries. The static approach does not have any of those.


#9

Nice! I think that currently it’s what is the most missing in this domain. (I mean, you can always find an adhoc solution for translating strings, but not being able to detect the locale correctly in a crossplatform manner is a problem, at least for command line interfaces).

Thanks! Honestly the implementation is quite hacky at this time, but yeah, I think a static approach would fit more with Rust approach.


#10

I have published the locale_config crate over the holiday. It can detect the selected locale on Unix (from the POSIX environment variables) and on Windows (except most user overrides) and provides some globals to remember the current value.

The value you get may be simple as “en-US”, “fr-FR”, “fr-BE” or “zh-CN”, or it can have additional extensions. Use language-tags to parse the LanguageRange values (I should really add some examples in the README; I hope to do it in a few days).


#11

Does anyone know what servo is using for this? Surely they have the same problem.


#12

Servo has no support for localization yet. Additionally, that’s probably the responsibility of the embedder of Servo, rather than Servo itself.


#13

In b2g we had 99.9% of the localization done in our “embedder”, aka. Gaia’s system app, and almost nothing left in gecko. The only thing I remember using gecko i18n were core UI elements like the [Browse] button displayed for .

If you plan in advance it should not be a problem to have the embedder provide these strings, but you still need some i18n framework in place in Servo.


#14

I don’t think that even needs much of an i18n framework. The embedder can provide the strings in client stylesheet or something like that and I don’t think there would be any number, date or time formatting or such. Unicode support is needed, but that is a separate topic too.