INFO: XML Encoding and DOM Interface Methods
This article was previously published under Q275883 SUMMARY
One major advantage of Extensible Markup Language (XML) data is that it is platform independent. However, correct encoding must be specified to ensure proper transfer of XML data between different platforms. The white paper "How to Encode XML Data" addresses general XML encoding issues in detail:
http://msdn.microsoft.com/xml/articles/xmlencodings.asp (http://msdn.microsoft.com/xml/articles/xmlencodings.asp)
Under most scenarios, XML encoding errors originate from the different default encoding settings of the Microsoft XML parser (MSXML) methods and interfaces. A clear understanding of these default settings will help in preventing the encoding errors.
MORE INFORMATIONXML EncodingsMSXML supports all encodings that are supported by Microsoft Internet Explorer. Internet Explorer's support depends on which language packs are installed on the computer; this information is stored under the following registry key:HKEY_CLASSES_ROOT\MIME\Database\Charset
MSXML has native support for the following encodings:
UTF-8
It also recognizes (internally using the WideCharToMultibyte API function for mappings) the following encodings:
UTF-16 UCS-2 UCS-4 ISO-10646-UCS-2 UNICODE-1-1-UTF-8 UNICODE-2-0-UTF-16 UNICODE-2-0-UTF-8
US-ASCII
The proper place to specify encoding for the data is the XML declaration. For example, if the data is encoded with ISO-8859-1 standard, you can specify this as follows:
ISO-8859-1 ISO-8859-2 ISO-8859-3 ISO-8859-4 ISO-8859-5 ISO-8859-6 ISO-8859-7 ISO-8859-8 ISO-8859-9 WINDOWS-1250 WINDOWS-1251 WINDOWS-1252 WINDOWS-1253 WINDOWS-1254 WINDOWS-1255 WINDOWS-1256 WINDOWS-1257 WINDOWS-1258
Without this information, the default encoding is UTF-8 or UTF-16, depending on the presence of a UNICODE byte-order mark (BOM) at the beginning of the XML file. If the file starts with a UNICODE byte-order mark (0xFF 0xFE) or (0xFE 0xFF), the document is considered to be in UTF-16 encoding; otherwise, it is in UTF-8. The Save method of the IXMLDOMDocument interface maintains the original encoding of the document. The default for this method is UTF-8.
MSXML DOM ErrorsTwo common errors that are returned from the XML Document Object Model (DOM) interface methods are:
An Invalid character was found in text content.
-and-
Switch from current encoding to specified encoding not supported.
With the load method of the IXMLDOMDocument interface, these errors usually occur under the following conditions:
With the MSXML parser versions 2.5, 2.5 SP1 and 2.6, the loadXML method of IXMLDOMDocument can only load UTF-16 or UCS-2 encoded data. Any attempt to load XML data that is encoded with another encoding format results in the following error:
Switch from current encoding to specified encoding not supported.
With the release of MSXML 3.0 (Msxml3.dll), this restriction is removed, and the following code runs without error:
NOTE: The xml property of the IXMLDOMDocument interface writes out the XML data as UTF-16 encoded but without the byte-order mark at the beginning. This may lead to encoding problems.You may also receive these errors when you call the transformNode method of the IXMLDOMNode interface with a XSL or XSLT file in which the XML encoding information is specified as follows:
The transformNode method returns a BSTR which is UTF-16 encoded data by definition. A better way to retain the encoding is to call the
transformNodeToObject method and store the results to a stream or to a new XML document and then save it.
REFERENCESFor additional information, click the article number below
to view the article in the Microsoft Knowledge Base:
259555 (http://support.microsoft.com/kb/259555/EN-US/) PRB: Error Occurs When You Open an ADO Recordset on XML Stream
For the latest XML download and information, see the following Microsoft Developer Network (MSDN) Website:
http://msdn.microsoft.com/xml/default.asp (http://msdn.microsoft.com/xml/default.asp) APPLIES TO
| Article Translations
|
Back to the top
