INFO: Tips for Writing DBCS-Compatible Applications

This article was previously published under Q75439
This article has been archived. It is offered "as is" and will no longer be updated.
3.00 3.10WINDOWSkbprg
String operations in systems that use a Double-Byte Character Set(DBCS) are slightly different from a single-byte character system.This article provides guidelines to reduce the work necessary to portan application written for a single-byte system to a DBCS system.
More information
In a double-byte character set, some characters require two bytes,while some require only one byte. The language driver can distinguishbetween these two types of characters by designating some charactersas "lead bytes." A lead byte will be followed by another byte (a "tailbyte") to create a Double-Byte Character (DBC). The set of lead bytesis different for each language. Lead bytes are always guaranteed to beextended characters; no 7-bit ASCII characters can be lead bytes. Thetail byte may be any byte except a NULL byte. The end of a string isalways defined as the first NULL byte in the string. Lead bytes arelegal tail bytes; the only way to tell if a byte is acting as a leadbyte is from the context.

The Windows Software Development Kit (SDK) version 3.0 includes twofunctions for moving through strings that may contain DBCs: AnsiNext() and AnsiPrev(). The AnsiPrev() function is a time expensive call because it must run through the string from the beginning to determine where the previous character begins. It is best to search for characters from the beginning rather than the end of a string.

The Windows SDK version 3.1 includes the IsDBCSLeadByte() function,which returns TRUE if and only if the byte CAN BE a lead byte. Becausethis function takes a char parameter, it cannot report if the byte ISa lead byte (to do so would require context).

To make non-DBCS code run as quickly as possible, a source file mayuse "#ifdef DBCS" around code that is only for DBCS, and compile twoversions of the object (OBJ) file. For example:
   #ifdef DBCS     for (pszTemp = szString; *pszTemp; pszTemp = AnsiNext(pszTemp))   #else     for (pszTemp = szString; *pszTemp; ++pszTemp)   #endif   ...				
To make the code easier to read, an application could define macrosfor the AnsiNext() and AnsiPrev() functions if DBCS is not defined:
   #ifndef DBCS   #define AnsiNext(x) ((x)+1)   #define AnsiPrev(y, x) ((x)-1)   #ifdef WIN31   #define IsDBCSLeadByte(x) (FALSE)   #endif   #endif				
With these definitions in place, all of the code can be written forDBCS. Note that the AnsiNext() function will not go past the end of a string and the AnsiPrev() function will not go past the beginning of a string, while the macros will. In addition, because the "y" parameterin the AnsiPrev() macro is ignored, some code will give differentresults when compiled with and without DBCS defined. The followingcode is an example of this phenomenon:
   pszEnd = AnsiPrev(++pszStart, pszEnd);				
The following code demonstrates how to find the offset of the filenamein a full path name:
   LPSTR GetFilePtr(LPSTR lpszFullPath)   {    LPSTR lpszFileName;    for (lpszFileName = lpszFullPath; *lpszFullPath;               lpszFullPath = AnsiNext(lpszFullPath))        if (*lpszFullPath == ':' || *lpszFullPath == '\\')            lpszFileName = lpszFullPath + 1;    return lpszFileName;   }				
Note that ':' and '\\' are guaranteed not to be lead bytes. The searchstarted from the beginning of the string rather than the end to avoidusing the AnsiPrev() function.

The following code demonstrates a string copy into a limited sizebuffer. Note that it ensures that the string does not end with a leadbyte.
   int StrCpyN(LPSTR lpszDst, LPSTR lpszSrc, unsigned int wLen)   {    LPSTR lpEnd;    char cTemp;    // account for the terminating NULL    --wLen;    for (lpEnd = lpszSrc; *lpEnd && (lpEnd - lpszSrc) < wLen;               lpEnd = AnsiNext(lpEnd))        ;  // scan to the end of string, or wLen bytes    // The following can happen only if lpszSrc[wLen-1] is a lead    // byte, in which case do not include the previous DBC in the copy.    if (lpEnd - lpszSrc > wLen)        lpEnd -= 2;    // Terminate the source string and call lstrcpy.    cTemp = *lpEnd;    *lpEnd = '\0';    lstrcpy(lpszDst, lpszSrc);    *lpEnd = cTemp;   }				
3.00 3.10

Article ID: 75439 - Last Review: 10/26/2013 02:11:00 - Revision: 2.0

  • Microsoft Windows Software Development Kit 3.1
  • kbnosurvey kbarchive kb16bitonly kbinfo KB75439