INFO: UTF8 Support

This article has been archived. It is offered "as is" and will no longer be updated.
UTF8 is a code page that uses a string of bytes to represent a 16-bitUnicode string where ASCII text (<=U+007F) remains unchanged as a singlebyte, U+0080-07FF (including Latin, Greek, Cyrillic, Hebrew, and Arabic) isconverted to a 2-byte sequence, and U+0800-FFFF (Chinese, Japanese, Korean,and others) becomes a 3-byte sequence.

The advantage is that most ASCII text remains unchanged and almost alleditors can read it.

Windows NT4.0 supports Unicode<->UTF8 translation viaMultiByteToWideChar()/WideCharToMultiByte(), using CP_UTF8 for the CodePageparameter, but it only works when none of the flags are set for dwFlags(therefore, you need to specify 0 for dwFlags).

Also, UTF8 is not a valid encoding for command line arguments for WindowsNT 4.0 or 5.0, and it is not supported on Windows 95.

Article ID: 175392 - Last Review: 01/11/2015 01:38:53 - Revision: 2.0

  • kbnosurvey kbarchive kbinfo kbintldev KB175392