This article was previously published under Q99884
This article has been archived. It is offered "as is" and will no longer be updated.
Windows NT version 3.1 employs a relatively new standard of characterrepresentation called Unicode. This new standard allows for greaterflexibility in adding support for localized versions of MicrosoftWindows NT.
The first and most prominent character standard in use by computerstoday is ASCII. This format is adequate for western languages, but ascomputers became more popular in European countries, the limitationsof ASCII became clear.
In an effort to overcome some of these limitations, the InternationalStandards Organization (ISO) established a new standard called Latin-1that defined European characters that were omitted from ASCII.Microsoft Windows modified the Latin-1 standard even further andcalled the character set Windows ANSI. However, by continuing use ofan 8-bit coding scheme, ASCII is only capable of representing 256unique symbols--considerably less than the 10,000 symbols that arecommon in such languages as Chinese, Korean, and Japanese.In addition to the language barriers, as the capabilities of computersbroaden beyond uppercase, mono-spaced fonts, the requirements for alarge set of unique characters (for example, letters, punctuation,mathematical and technical symbols, and publishing characters) havealso grown far beyond the capabilities of 8-bit text.
The lowest level of localization (adaptation to a particular language)is the actual binary representation of characters: the code set. Toovercome the limitations of the other coding methods, several majorcomputer companies, including Apple Computer, Inc., Sun Microsystems,Inc., Xerox Corp., and IBM (International Business Machines Corp.),formed Unicode Inc., a non-profit consortium, to set out to define anew standard for international character sets. At the same time, theISO began developing a standard. Eventually, these standards mergedand became Unicode. Unicode is published as The Unicode Standard,Worldwide Character Encoding.
Unicode employs a 16-bit coding scheme that allows for 65,536 distinctcharacters--more than enough to include all languages in use today. Inaddition, it supports several archaic or arcane languages such asSanskrit and Egyptian hieroglyphs. Unicode also includesrepresentations for punctuation marks, mathematical symbols, anddingbats, with room left for future expansion. Because it establishesa unique code for each character in each script, Windows NT can ensurethat the character translation from one language to another isaccurate.
Unicode in Windows NT
Unicode is the native code set of Windows NT, but the Win32 subsystemprovides both ANSI and Unicode support. Character strings in thesystem, including object names, path names, and file and directorynames are represented with 16-bit Unicode characters. The Win32subsystem converts any ANSI characters it receives into Unicodestrings before manipulating them. It then converts them back to ANSI,if necessary, upon exit from the system.
Unicode Inc. 1965 Charleston Road Mountain View, CA 94043 Phone (415) 961-4189
"Inside Windows NT," by Helen Custer, Microsoft Press, 1992
"Program Migration to Unicode," by Amus Freytag, Proceedings of theFirst Unicode Implementers Workshop, The Unicode Consortium, MountainView, California, August, 1991
"Adapt Your Program for Worldwide Use with WindowsInternationalization Support," by William S. Hall, Microsoft SystemsJournal, Vol. 6, No. 6, Nov./Dec. 1991
"Operating Systems Design and Implementation," by Andrew S. Tanenbaum,Prentice-Hall, Inc., Englewood Cliffs; New Jersey, 1987