Issues and Advantages of the use of Locales in Software

I18nGuy home page

This page contains ideas generated by the (now retired) Locales discussion list.

Links
Information resources for locales
Table of Issues using Locales in Software
Table of Advantages using Locales in Software
Table of solution ideas
Locales, Locales, Locales, at the IUC22 Unicode Conference
Key points from the Locales Panel at IUC22
IUC22 Presentation What's wrong with Locales? (PDF, 195Kb)
IUC22 Locales panel paper (PDF, 177Kb)

Note the comments in some cases do not reflect our personal views.

In the table below, there is reference to "attributes or behaviors" that might be associated with locale. Depending on who you talk to, this can include (in no particular order):
date format,   time format,   time zone,   calendar,   number format and digits,   currency & format,   collation,   character set,   case rules,   hyphenation,   justification,   quote character,   title & address & telephone format,   user interface language,   writing direction,   list separator,     license plate format,   form layout (paper size),   units of measurement (pounds, grams, stones, inches, meters, etc.),   keyboard,   input method. (Also see the comment on voice in the table.)

Some would restrict locale to language related issues (quote character, writing direction), others use the term to refer to the larger set of targets for internationalization within applications.

Table of Issues using Locales in Software
IDIssues
01Based on political units (ISO 639 languages, ISO 3166 regions) rather than driven by user requirements
02Based on Posix, predates object oriented and more modern programming styles
03ISO 639 and 3166 can and do change. Software needs a stable reference.
04The behavior associated with any specific locale is undefined. There is no standard identifying the attributes or behaviors associated with locale generally, or for specific locales.
05The behavior associated with a locale may change. An example is the currency in Europe changing to the Euro and some software changing the currency associated with European locales after some date. This changes the meaning of existing data.
Another example is that the the German government changed capitalization rules a year or so back. If your software was used uppercasing tables to normalize text, you could have a problem. (For example, if the login procedure upper-cased names to standardize for comparison.)
Time zones also change, both the region covered and the offset from UTC.
Rules for spelling and sorting also change from time to time.
06The language+region hierarchy does not describe real-world situations. It works best for homogenous groups in well-defined nation-state units (French for France, German for Germany).
It breaks down for cross-border politics. Examples include: Kurdish speakers, Albanian speakers, minorities within nations (e.g. Iberian or Indian languages, Chinese-speakers in Canada), or multi-lingual nations Switzerland, Canada, Belgium, India, South Africa, etc.
07Locales allow user defined variants. However major software vendors have also defined variants. This can and does cause conflicts. Application "A" with variant "xx-yy-zz" needs to run on System "B" that requires a setting of "xx-yy-eu".
08Java, the "Web", C programming libraries, et al. use "locales", but do not have the same behavior. Some programming environments have their own model (Windows, .Net). Integration of components or systems that reference a particular locale may not behave consistently, may interpret data differently, or may require a separate "flag" to define some attribute of the locale. e.g. A locale of fr_fr may mean Francs to one component, Euros to another, and a third component may require some other "flag" to establish currency.
09No standard definition exists that prescribes which combinations are valid. Depending on the operating system version, the browser version, the Java VM version, their localization, and their manufacturers, different locale pairs will be accepted. Other locales, cause crashes or default to something undesirable.
10 *Language* is not what is relevant for most current implementations; it's orthography, which is a particular usage of a particular writing system for a particular language.
11In the future, locales should handle settings for both text and voice. We may then need both orthography and dialect to be properties of a locale
12It is often not clear whose locale to use. If an order is mailed from locale x to locale y, which items should be in locale x (e.g. the currency in the order should not change) and which in y? (e.g. the mail header with the time sent or received should be in the receiver's locale.) On the other hand, a travel itinerary should not change dates and times to the vender's or user's locale, as destination and arrival times should remain in the locale of the location under discussion.
13A locale may span a region large enough to represent multiple values for an attribute or a behavior. For example, the U.S. spans more than one time zone.
14Languages that are not included in the ISO 639 language group standard cannot be referenced. (Current locale identifiers use the 2-letter code, not the 3-letter code.) There are many language-country values missing.
15Should the rules for "rounding numbers" be impacted by the locale setting?
16 Originally, (perhaps) locales were created as a shorthand for setting preferences for common situations. An example situation is English-speaking users in the US. Today there is some expectation or desire for locales to address a wider variety of situations. An example might be a Spanish-speaking user in the US, with some differentiation of whether the user is originally from Spain, Mexico, Columbia or Chile.

It is difficult to evaluate conflicts (does the user want his original decimal-comma number format or the regional comma-decimal format).
17Where do you draw the line for the number of scenarios to support? (Or do you draw a line?) Perhaps based on population? i.e. count the number of Vietnamese-speaking users in the U.S. to determine if a Vietnamese-US locale should be offered?
18Many users are mobile. As they change location their preferences may change. e.g time-zone, but it may also be desirable for language and other locale components to change (varying by user). (When in Rome,...). Should users be able to set these kind of hybrid locales? (Maybe this is personalization?)
19People mistake "zh-tw" to mean Traditional Chinese, and "zh-cn" for Simplified. "zh-tw" only means Chinese used in Taiwan and "zh-cn" only means Chinese used in China. It happens that people in Taiwan use Traditional Chinese and people in China use simplified Chinese. The code "zh-tw" itself does not imply traditional Chinese or simplified Chinese. In the future, "zh-cn" could mean traditional Chinese if PRC government decided so, and "zh-tw" could come to mean something entirely different.
20 
Top of page
Table of Advantages using Locales in Software
IDAdvantages
01simplicity
02adopted by many systems/applications
03easy to understand and implement
Top of page
Table of potential solution ideas
IDIdea
01To address changes in locale definitions over time include a timestamp in the name. Perhaps the locale 2001:fr-FR would use the Franc, where 2002:fr-FR would use the Euro. To the extent that archived data has a date associated with it, this will help maintain the integrity of the locale-based information. Locales specified without a timestamp would presume the current date.
02An XML format for interchange of "user preferences". With that, one could capture things such as the fact that my normal date/time/number formats are en-US, but that I want "YYYY-MM-DD" for dates.
Top of page