What to Expect of Javascript in an International Web Application

by : Adam Asnes

Given JavaScript's status as the de facto browser client scripting language, and given the international nature of the Internet, it was inevitable that JavaScript and internationalization (i18n) would eventually cross paths. At Lingoport, (www.lingoport.com) we see a good deal of JavaScript in our client's code that we internationalize. While JavaScript is not completely without international capabilities and functionality, it does have its share of challenges and faults. This article briefly discusses some of what to expect of JavaScript in an international web application - what works (the good), what to watch out for (the bad), and what to avoid (the ugly).

The Good - Unicode

Probably the best news about JavaScript and i18n is that it supports Unicode. This means you should never have to worry about character corruption provided you take care to make sure that JavaScript is using it.

If a JavaScript script block is embedded in an HTML file, it will automatically assume the character encoding of the enclosing page. Thus, if you have defined your HTML character set as UTF-8 you have done all you need to do. If your JavaScript is included as a separate .js file, you can add a charset attribute to your script tag to specify the character encoding of the included file. For example, a JavaScript file called functions.js that is encoded in UTF-8 would be included like this:

(go to articles at Lingoport.com to see code snippets)

You can also include Unicode characters in any JavaScript regardless of encoding by defining the characters using Unicode escape definitions (u + 4 hexadecimal values that specify the Unicode character value in big-endian order). For example, you could define a string with a smiley face character like this:

(go to articles at Lingoport.com to see code snippets)

JavaScript is even smart enough to know the length of Unicode strings in terms of characters and not bytes. For example, smiley.length would return 1.

The Bad - Strings

One of the more annoying issues with JavaScript and i18n is dealing with embedded strings. As with any other programming language, embedded strings in an application's code make it difficult if not impossible to localize. Unfortunately, JavaScript does not have the concept of a resource file, and strings that will be generated by JavaScript must be defined in the code.

The easiest approach to deal with this issue is to define your JavaScript strings dynamically in server-side code (Java/JSP, ASPX, PHP, etc.). The following example defines some string resources in a JavaScript script block at the top of a JSP page:

(go to articles at Lingoport.com to see code snippets)

Assuming the currentLocale object is set to English (US), the resulting block should look like this:

(go to articles at Lingoport.com to see code snippets)

When currentLocale is set to German (Germany) it should change to this:

(go to articles at Lingoport.com to see code snippets)

For French (France):

(go to articles at Lingoport.com to see code snippets)

You get the idea.

There are a couple things to keep in mind with this approach. First, any strings that are embedded in the files, whether JSP/ASPX/PHP/etc. or JavaScript .js files, must be externalized, i.e. the strings should be moved into the string resource block as demonstrated below, and replaced in the code with their variable names. Second, the JavaScript string resource block should be defined before any other embedded blocks or .js file includes that make use of these externalized strings. For example, the resource block should be defined before the following function is called:

(go to articles at Lingoport.com to see code snippets)

Note that this simple example doesn't deal with more sophisticated functionality such as locale fallback, but this basic approach solves the simpler string resource-related issues common in JavaScript.

The Ugly - Language, Dates/Times

When it comes to language, JavaScript knows enough to be dangerous. That is, it knows what the browser's default language is (it's defined in navigator.language for Netscape-descendent browsers such as Firefox and in navigator.browserLanguage for Internet Explorer).

On my English (US) system these get reported as "en-us" or "en_US." It is tempting to think that this information is a useful indication of the preferred language of the user, and in many cases it will be, but it doesn't allow for the possibility of a user preferring a language other than the browser default.

On a related note, there are a small number of "locale-specific" methods in JavaScript, which deal with the presentation of dates and times as strings, but these are always formatted in a single format for the browser's default locale. This also applies to the ability to parse date and time strings; they will only be parsed correctly if the strings are formatted according to the conventions of the browser's default locale.

Although these provide some minimal language support, it is actually best to ignore these and instead rely on the server to provide this functionality as much as possible.

With the advent of AJAX, a higher level of i18n functionality becomes possible because of the ability to interact with the server in a more seamless fashion. Using AJAX to achieve this higher functionality in JavaScript will be discussed in a future article.