INFO: XML Encoding and DOM Interface Methods

Article translations Article translations
Article ID: 275883
Expand all | Collapse all

Summary

One major advantage of Extensible Markup Language (XML) data is that it is platform independent. However, correct encoding must be specified to ensure proper transfer of XML data between different platforms. The white paper "How to Encode XML Data" addresses general XML encoding issues in detail:
http://msdn.microsoft.com/en-us/library/aa468560.aspx
Under most scenarios, XML encoding errors originate from the different default encoding settings of the Microsoft XML parser (MSXML) methods and interfaces. A clear understanding of these default settings will help in preventing the encoding errors.

More information

XML Encodings

MSXML supports all encodings that are supported by Microsoft Internet Explorer. Internet Explorer's support depends on which language packs are installed on the computer; this information is stored under the following registry key:
HKEY_CLASSES_ROOT\MIME\Database\Charset
MSXML has native support for the following encodings:
UTF-8
UTF-16
UCS-2
UCS-4
ISO-10646-UCS-2
UNICODE-1-1-UTF-8
UNICODE-2-0-UTF-16
UNICODE-2-0-UTF-8
It also recognizes (internally using the WideCharToMultibyte API function for mappings) the following encodings:
US-ASCII
ISO-8859-1
ISO-8859-2
ISO-8859-3
ISO-8859-4
ISO-8859-5
ISO-8859-6
ISO-8859-7
ISO-8859-8
ISO-8859-9
WINDOWS-1250
WINDOWS-1251
WINDOWS-1252
WINDOWS-1253
WINDOWS-1254
WINDOWS-1255
WINDOWS-1256
WINDOWS-1257
WINDOWS-1258
The proper place to specify encoding for the data is the XML declaration. For example, if the data is encoded with ISO-8859-1 standard, you can specify this as follows:
<?xml version="1.0" encoding="ISO-8859-1"?>
				
Without this information, the default encoding is UTF-8 or UTF-16, depending on the presence of a UNICODE byte-order mark (BOM) at the beginning of the XML file. If the file starts with a UNICODE byte-order mark (0xFF 0xFE) or (0xFE 0xFF), the document is considered to be in UTF-16 encoding; otherwise, it is in UTF-8. The Save method of the IXMLDOMDocument interface maintains the original encoding of the document. The default for this method is UTF-8.

MSXML DOM Errors

Two common errors that are returned from the XML Document Object Model (DOM) interface methods are:
An Invalid character was found in text content.
-and-
Switch from current encoding to specified encoding not supported.
With the load method of the IXMLDOMDocument interface, these errors usually occur under the following conditions:
  • No encoding is specified, no byte-order mark is found at the beginning of the XML file, and the data contains special characters encoded in UTF-16 format (such as Swedish character å, or 0xE5) rather than the default UTF-8 (0xC3 0xA5) format.
  • The specified encoding does not match the actual encoding of the XML data.
A good practice is to always specify the correct encoding inside the XML declaration, rather than accepting the default encoding.

With the MSXML parser versions 2.5, 2.5 SP1 and 2.6, the loadXML method of IXMLDOMDocument can only load UTF-16 or UCS-2 encoded data. Any attempt to load XML data that is encoded with another encoding format results in the following error:
Switch from current encoding to specified encoding not supported.
With the release of MSXML 3.0 (Msxml3.dll), this restriction is removed, and the following code runs without error:
hr = pXMLDoc->loadXML("<?xml version=\"1.0\" encoding=\"UTF-8\"?><tag1>Abcdef</tag1>");
				
NOTE: The xml property of the IXMLDOMDocument interface writes out the XML data as UTF-16 encoded but without the byte-order mark at the beginning. This may lead to encoding problems.

You may also receive these errors when you call the transformNode method of the IXMLDOMNode interface with a XSL or XSLT file in which the XML encoding information is specified as follows:
<xsl:output method="xml" encoding="UTF-8" />
				
The transformNode method returns a BSTR which is UTF-16 encoded data by definition. A better way to retain the encoding is to call the transformNodeToObject method and store the results to a stream or to a new XML document and then save it.

References

For additional information, click the article number below to view the article in the Microsoft Knowledge Base:
259555 PRB: Error Occurs When You Open an ADO Recordset on XML Stream
For the latest XML download and information, see the following Microsoft Developer Network (MSDN) Website:
http://msdn.microsoft.com/xml/default.asp

Properties

Article ID: 275883 - Last Review: June 22, 2014 - Revision: 2.0
Keywords: 
kbproductlink kbinfo KB275883

Give Feedback

 

Contact us for more help

Contact us for more help
Connect with Answer Desk for expert help.
Get more support from smallbusiness.support.microsoft.com