I18n Guy, your I18n advisor

Introduction to:
Examples Of Unicode Usage For Business Applications

Table of Examples of Unicode Usage For Business Applications
Title/link Description
Examples using Unicode UTF-8 characters This page uses Basic Multilingual Plane (BMP, plane 0) characters encoded as UTF-8 to represent names and places from around the world. Use IE 6, Mozilla 1.1b, or NS 6.2 to view. Looks good under Opera 6 except for Bhutan. Konqueror 2.2.2 doesn't display the page.
PDF file
Examples using Unicode UTF-8 characters
You can view the Unicode example usage for business applications, without installing special fonts, etc.
This file is an Acrobat PDF file showing the page above, using Basic Multilingual Plane (BMP, plane 0) characters encoded as UTF-8 to represent names and places from around the world.
Example using Ruby Annotation This is the BMP example reformulated as a 2 column table and using Ruby Annotation to indicate the English transliteration of each table entry.
Currently (2002-04), it looks like neither Opera 6, or Netscape 6.2 support Ruby Annotation. It looks ok with IE 6, although I wish I could control the height of the annotation over the base text.
This page was recently changed to xhtml 1.1. You can still view with IE.
Mozilla and Opera 6 on the Mac OS X don't support Ruby yet.
Example Data, CSV format A slightly older version of the table data is available as a comma-separated, UTF-8 text file. The file is zipped as unicode-example-utf8.zip. The data can be useful for testing Unicode applications or to demonstrate the value of using Unicode for different purposes.
Examples Using Supplementary Plane 1 Characters Encoded As NCRs This page uses supplementary plane 1 characters encoded as Numeric Character References (NCRs) to represent names and places in scripts such as Etruscan, Gothic, and Deseret.
Currently (2002-04), the example works with IE 5.5 and Opera 6. NS 6.2 does not support supplementary characters yet.
(2003-03) Mozilla 1.3 displays plane 1 now. Even the RTL Etruscan!
(2002-11) Mac OS X browser Opera 6 displays UTF-8 web pages with Supplementary Characters.
(2003-03-09) OmniWeb 4.2b1 browser on Mac OS X 10.2.4 displays plane 1 with NCR, UTF-8, UTF-16. However, Etruscan is not displayed RTL.
Examples Using Supplementary Plane 1 Characters Encoded In UTF-8 This page uses supplementary plane 1 characters encoded as UTF-8 to represent names and places in scripts such as Etruscan, Gothic, and Deseret.
Currently (2002-06), Opera 6.01 works well. NS 6.2 and IE 6 do not support UTF-8 supplementary characters yet.
(2003-03) Mozilla 1.3 displays plane 1 now. Even the RTL Etruscan!
(2002-11) Mac OS X browser Opera 6 displays UTF-8 web pages with Supplementary Characters.
(2003-03-09) OmniWeb 4.2b1 browser on Mac OS X 10.2.4 displays plane 1 with NCR, UTF-8, UTF-16. However, Etruscan is not displayed RTL.
(2003-10-01) Ximian Desktop 2 (XD2) using either Mozilla 1.4 or Galeon 1.3.5 on Linux displays this page well. (Thanks Simos Xenitellis)
Examples using Supplementary Plane 1 characters encoded in UTF-16 This page uses supplementary plane 1 characters encoded as UTF-16.
Currently (2002-04) Netscape 4.7, Opera 6.03 display this ok, although Etruscan is not properly right to left.
IE 6 doesn't seem to support UTF-16 yet.
NS 6 supports UTF-16LE but doesn't display this page.
(2002-11) NS 7 can't display Etruscan.
(2002-11) Opera 6 and OmniWeb on Mac OS X display this page.
(2003-03) Mozilla 1.3 displays plane 1 now. Even the RTL Etruscan! However, it detects the page is UTF16-BE and needs to be overridden as UTF16-LE.
(2004-02) Firefox 0.8 recognizes the encoding correctly and displays this page!
Examples using Supplementary Plane 1 characters encoded in UTF-32 This page uses supplementary plane 1 characters encoded as UTF-32.
(2002-04) This page did not display on any browser I tried.
(2003-04) Opera 7 displays this page!
(2004-02) Firefox 0.8 displays this page!
Test of lang on search engines This is a little experiment on the use of the keyword meta statement and the lang facility.

 To Submit Additional Examples 

If you would like to submit additional examples for languages or regions not already represented here, send an email to Tex Texin with the information for each field. If you send data that is not already in UTF-8 encoding, please be sure to state the code page or encoding that you use. (Note: It may take me a few days to respond.)
At this time I am only adding examples that show off characters or other aspects of Unicode that are not already represented.


 Displaying the Examples: Encoding Support 

Most of the pages are encoded in UTF-8. If the display does not seem correct to you:
• verify that you are using a browser that supports Unicode,
• insure that your browser is using UTF-8 encoding to view the data, and
• that you are using fonts that supports Unicode and particularly, the characters used in the examples.
• The plane 1 example pages require that support for surrogate code points be enabled on Windows NT/2000/XP. See: Setting up Windows to support surrogates
• Mac users are referred to Tom Gewecke's Unleash Your Multilingual Mac and Alan Wood's Setting up Macintosh OS X 10 Web Browsers for Multilingual and Unicode Support.

 Displaying the Examples: Fonts Support 

The BMP example page displays well with the following fonts:
"Arial Unicode MS" font (comes with Microsoft Office 2000). It is downloadable, but is not freeware. See the mail from Chris Pratley about licensing and downloading it.
Inuktitut font
Ethiopic font
James Kass' CODE2000 font

 Notes on Braille 

The example used a caseless American 6-dot Braille described in the fact sheet from the National Library Service for the Blind and Physically Handicapped to transliterate the text "France" and "Louis Braille" to Braille encoding. Braille letter assignments differ by country. There are also 8 dot encodings. Someday the example will instead use the French Braille encoding. A few more links on Braille are: Unicode book on Braille,    Braille Unicode chart,    Canadian Braille Authority.


For more information on displaying Unicode, go to the Unicode Consortium web page on Display Problems.


 Other Unicode Samplers: 

Creator Sampler Page
Unicode ConsortiumWhat Is Unicode?
Michael KaplanAnyone can be provincial!
Mark DavisTranscriptions of "Unicode"
Frank da Cruz UTF-8 Sampler
James Kass Sample Unicode Test Pages and Script Links
Jungshik Shin UTF-16 and UTF-32 Test Pages
Andrew Dunbar When the world wants to talk, it speaks Unicode

 Contributors 

The following people contributed to the entries in the table:

Sajjad Raza Abidi, Mike Ayers, Edward Cherlin, Marco Cimarosti, Frank da Cruz,
Magda Danish, Jacline Deridder, Donald Figge, Chris Fynn, Mikael Hjerpe,
Juuichiketajin, Nigel Kerr, Lars Kristan, Akekarat Lekpornprasert
(เอกรัตน์ เล็กพรประเสริฐ),
Gavin Nesbitt,
David Perry, Toby Phipps, Audun Lona, Roozbeh Pournader
(روزبه پورنادر),
Chris Pratley,
Yaap Raaf, Herman Ranes, Graham Rhind, Ken Robinson, Jony Rosenne,
Jungshik Shin, Thierry Sourbier, Otto Stolz, Marie-Anne de Warren, Daniel Yacob,
Maurice Bauhahn, James Do, Deborah Goldsmith, Michael Kaplan, Muthu Nedumaran,
Radovan Garabík, Philipp Reichmuth, Andrew Dunbar, Steven R. Loomis, Tex Texin
Top of page Encoded in UTF-8!

Send comments to Tex Texin
This page was last updated 2005-10-03.