For years, BASIC programmers have been using the Asc and Chr functions toaccess and manipulate the ASCII character set. With the advent of Unicodeacceptance in mainstream operating systems and applications, the need forimproved versions of the Asc and Chr functions has developed. To meet thisdemand, Microsoft Visual Basic (4.0 and higher) for Windows includes theAscB/ChrB and AscW/ChrW functions.
Unicode is a standard that is designed to replace the ANSI standard forencoding characters in a numeric form. Because the ANSI standard only usesa single byte to represent each character, it is limited to a maximum of256 different characters. While this is sufficient for the needs of anEnglish speaking audience, it falls short when the worldwide softwaremarket is considered. With the Unicode standard, each character isrepresented by two bytes, so that the entire Unicode character set includes65,536 possible locations.
Microsoft Windows NT, Microsoft Windows 2000, and Microsoft OLE 2.0 are entirely Unicode based,and Visual Basic (4.0 and higher) represents all strings internally inUnicode format. The AscW and ChrW functions allow access to the full rangeof Unicode characters. These functions work in the same way as the originalAsc and Chr functions except that they support arguments from 0 to 65,535instead of just from 0 to 255. Many Visual Basic objects (such as the debugwindow and the label and text box) return a "?" when these objects do notknow how to display an Unicode character.
Because all strings are now represented internally in Unicode format, itis not as simple as it used to be to represent binary data in a string.Using the Chr function to assign data to a string does not result in thesame behavior as before. For example:
results in a two-byte long string, where byte 1 has a value of 65 and byte2 has a value of 0 (this is the Unicode representation of the letter "A").Be sure to keep in mind that converting from ANSI to Unicode does notalways entail just adding a second byte with a value of zero as it does inthis case. For example, most of the ANSI character codes in the range130-159 have completely different Unicode values. Try executing a'Debug.Print AscW(Chr(130))' and you a value of 8218 is displayed.
Currently, Microsoft Windows requires a little endian processor, whichmeans that in a multiple byte entity the first byte is the leastsignificant, and significance increases in successive bytes. This explainswhy the Unicode character "A" is represented internally as the following:
------------------- | 65 | 0 | ------------------- byte 0 byte 1
The AscB and ChrB functions can be used to replicate what used to beaccomplished by the Asc and Chr functions, because these functions allowthe manipulation of single byte quantities. If you would like a four-bytestring that has the binary values of 65, 66, 67, and 68 consecutively thenusing the Chr function will not work. You must instead use the ChrBfunction. For example:
stringvar = ChrB(65) & ChrB(66) & ChrB(67) & ChrB(68)
Alternatively, you can use the ability to create arrays of the new bytedata type and manipulate your binary data that way.
Listed below is an explanation of the results of some simple uses of thesefunctions to further clarify this information.
Print Asc(Chr(255)) --> "255"
Nothing new here, except that the Chr function is returning a Unicodecharacter that occupies two bytes instead of a one-byte ANSI character.
Print Asc(ChrB(255)) --> 5 - Invalid procedure call.
This usage returns an error because the Asc function always expects atleast a two-byte parameter and the ChrB function is only returning a singlebyte.
Print Asc(Chr(256)) --> 5 - Invalid procedure call.
Although the Chr function returns a two-byte Unicode character, it stillonly takes numbers between 0 and 255 for its argument (note that on a DBCSenabled system, Asc/Chr handle two-byte DBCS characters, converting them toand from Unicode). Using the ChrW function allows access to the full 65,536Unicode character locations.
Print AscW(ChrW(256)) --> "256"
This is the new version of the first statement in this section. The ChrWfunction takes a value from 0 to 65,536 and returns that character (on32-bit systems). The AscW function interprets this two-byte character as aUnicode character and returns the correct Unicode value for that character.
Print Asc(ChrW(256)) --> "65"
Print Asc(ChrW(5000)) --> "63"
What is happening here is that the ChrW function is being evaluated first.ChrW(256) is the character "A", and so the function reduces to Asc("A"),and the Unicode (and ANSI) number for "A" is 65. Because Visual Basicdoes not know how to display the character represented by Chr(5000) it justdisplays a "?", and as expected, the Unicode and ANSI value for "?" is 63.
Print AscB(Chr(65)) --> "65"
Print AscB(ChrW(256)) --> "0"
Print AscB(ChrW(257)) --> "1"
Print AscB(ChrW(555)) --> "43"
Print AscB(ChrW(65535)) --> "255"
All of these return values can be explained by understanding how eachcharacter is represented internally (see the little-endian reference above)and by the fact that the AscB function looks only at the first byte of thecharacter it receives. Visually it looks like the following diagram:
------------------- Chr(65) | 65 | 0 | ------------------- Chr(256) | 0 | 1 | ------------------- Chr(257) | 1 | 1 | ------------------- Chr(555) | 43 | 1 | ------------------- Chr(65535)| 255 | 255 | ------------------- byte 0 byte 1
The AscB function just returns whatever the first byte of the character is.
Print ChrB(65) --> ""
Visual Basic prints nothing for this call to the ChrB function because theChrB function is only returning a one-byte string. One byte strings likethis mean nothing to Visual Basic because they do not constitute a validUnicode character (or series of characters).
Print ChrB(65) & ChrB(0) --> "A"
In this case, we are concatenating two one-byte strings into a singletwo-byte string. Because the resulting bit pattern is the same as the bitpattern for the Unicode "A", that is what Visual Basic prints.