PRB: Strings Passed to loadXML must be UTF-16 Encoded BSTRs

Symptoms

When you use the loadXML method of the MSXML parser, attempting to include non UTF-16 character sequences in the BSTR parameter passed to loadXML may result in the following error message:
An Invalid character was found in text content.

Furthermore, attempting to change the encoding of the string, specifying an "encoding" attribute on the main XML processing instruction, for example results in the following error message:
Switch from current encoding to specified encoding not supported.

Cause

The string parameter must be BSTR format. BSTR format strings are always UTF-16.

Resolution

The MSXML 3.0 or later parser does not have this restriction and can accept XML strings with other encodings, such as UTF-8.

To download the latest MSXML parser, go to The following workarounds are available for parser versions prior to 3.0:

Scripting developers have two options available:

  1. Convert your XML documents to UTF-16-formatted Unicode, either automatically or by hand.Escape all non-Unicode character encodings inside the XML document using XML Unicode entity references. Any XML character can be encoded in plain ASCII using the form &#xxxx, where xxxx is its index into the Unicode character set.
  2. Escape all non-Unicode character encodings inside the XML document using XML Unicode entity references. Any XML character can be encoded in plain ASCII using the form &#xxxx, where xxxx is its index into the Unicode character set.
Microsoft Visual C++ developers have a third option: load data into MSXML using a method other than loadXML. Typical misuse of loadXML results from the desire to load XML data from memory; the IXMLDOMDocument::Load method actually has several overloads that are superior alternatives to loadXML.

See the following Knowledge Base article for more information:
223337 INFO:Loading/Saving XML Data with Internet Explorer XML Parser

Specifically, the Load method can be passed a SAFEARRAY stuffed full of tasty XML data encoded in any scheme.

Status

This behavior is by design.

The MSXML 3.0 or later parser does not have this restriction.
Svojstva

ID članka: 247708 - posljednja izmjena: 19. lip 2014. - verzija: 1

Povratne informacije