INFO: Windows, Code Pages, and Character Sets

This article was previously published under Q75435
Retired KB Content Disclaimer
This article was written about products for which Microsoft no longer offers support. Therefore, this article is offered "as is" and will no longer be updated.
The ASCII (American Standard Code for Information Interchange)character set defines a mapping of the letters, numerals, andspecified punctuation and control characters to the numbers from zeroto 127. The term "code page" is used to refer to extensions of theASCII character set that also map specified symbols to the numbersfrom 128 through 255.

This article discusses how Windows deals with code pages and warnsagainst some of the pitfalls that applications can encounter.
The ANSI (American National Standards Institute) character set mapsthe letters and numerals in the same manner as ASCII. However, ANSIdoes not support control characters and it maps many symbols,including accented letters, that are not mapped in standard ASCII. AllWindows fonts are defined in the ANSI character set.

An Original Equipment Manufacturer (OEM) code page is built into thecomputer hardware. There are a number of OEM code pages, each definedfor a particular language. These code pages are referred to by anumber; for example, code page 437 is installed in the original IBM PCcomputer.

MS-DOS uses code pages to change the available character set, dependingon user preference. A code page change is implemented by programming anew character set into the video display hardware. By changing to thecode page for a particular language, the accented charactersappropriate to that language are made available. Each code page islimited to 256 symbols.

For each code page, MS-DOS maintains a mapping table to map lowercasecharacters to and from uppercase. Because all string parameters toMS-DOS (filenames) are implicitly coded in the current code page,when the table is changed, filenames that were accessible under onecase mapping may not be available under another. However, the commoncode pages were designed to combat this problem.

Windows runs as an extension to MS-DOS. There is a mapping layer thattranslates between the ANSI character set and an OEM character set.When Windows is installed, the Setup program determines the installedcharacter set and installs the corresponding ANSI-OEM translationtables and Windows OEM fonts.

If the user changes the current MS-DOS code page, Windows does notchange its ANSI-OEM mapping tables automatically. It is necessary torun the Windows Setup program to modify these tables and to load thecorresponding fonts.

Windows-based applications must use the Windows functions AnsiToOem() and OemToAnsi() when transferring information to and from MS-DOS. In addition, applications must use the correct character set when creating filenames.

There is no one-to-one mapping between the ANSI and OEM charactersets. Applying the AnsiToOem() function followed by the OemToAnsi() function to a given string will not always result in the original string. A file that has been named with one of these strings cannot be accessed by any Windows-based application. The filename must be changed by the user from outside of Windows.

The following two scenarios may have differing results.


A lowercase ANSI string is passed to the AnsiToOem() function. Theresult is passed to MS-DOS, which maps the string to uppercase.


An uppercase ANSI string is passed to the AnsiToOem() function. Thestring is passed to MS-DOS.

This is caused by the fact that the MS-DOS lowercase to uppercaseconversion mapping and the Windows ANSI case conversion do not match.To avoid this problem, use the AnsiUpper() function to convert theANSI string to uppercase before passing it to the AnsiToOem()function. Also note, this is only a problem with extended characters.These problems are often overlooked until your customers call tocomplain.

Keep in mind that both ANSI and OEM are 8-bit character sets. Inapplications, always use the "unsigned char" type instead of "signedchar" for character variables. Problems that result from using "signedchar" are very hard to track.

The SYSTEM.ini file contains entries that relate to code pages. In the[boot] section, the OEMFONTS.fon line specifies the file that containsthe OEM stock font. In the [keyboard] section, the OEMANSI.bin linespecifies the ANSI-OEM translation table. If this line is blank,Windows uses the default table built into the keyboard driver. In the[enh] section, the *WOA.fon lines specify the fonts used in an MS-DOSwindow at various resolutions.

Windows does not provide any mechanism for an application to work withdata that is not in the current MS-DOS code page, nor does Windowsrecognize changing the OEM code page in an MS-DOS window. However, anapplication is free to provide its own translation tables and providea data format that includes the code page.

Article ID: 75435 - Last Review: 02/11/2005 20:49:20 - Revision: 1.1

Microsoft Windows Software Development Kit 3.1

  • kb16bitonly kbinfo KB75435