XMLDataSet String Handling

XML Data Set String Handling

This example describes how string values in XML are handled by the XMLDataSet.

Example 1: Entity encoding/decoding of string values in Spry 1.5 and earlier.
Example 2: Entity encode string values mode. (The default mode)
Example 3: Entity decode string values mode.

Example 1: Entity Encoding/Decoding of Text Values in Spry 1.5 and earlier.

In Spry Pre-Release 1.5 and earlier versions, the XML Data Set handled the string content of #text nodes differently from #cdata nodes. To illustrate the difference, consider the following XML:

<?xml version="1.0" encoding="utf-8"?>
<data>
  <name>HTML Markup Entity Encoded Once</name>
  <tvalue>&lt;p&gt;Plain Text &lt;span style=&quot;font-weight: bold;&quot;&gt;Bold Text&lt;/span&gt; &lt;span style=&quot;color: red;&quot;&gt;Red Text&lt;/span&gt;&lt;/p&gt;</tvalue>
  <cvalue><![CDATA[<p>Plain Text <span style="font-weight: bold;">Bold Text</span> <span style="color: red;">Red Text</span></p>]]></cvalue>
</data>

To an XML parser, the string content that ends up underneath the <tvalue> node and the <cvalue> node are identical after it is parsed and stored in the resulting XML DOM tree. But, if we used the following data set and region code in Spry 1.5 (and earlier) to print the values:

<script src="../../Spry/includes/xpath.js" type="text/javascript"></script>
<script src="../../Spry/includes/SpryData.js" type="text/javascript"></script>
<script type="text/javascript">
<!--
var dsExample1a = new Spry.Data.XMLDataSet("../data/encode-01.xml", "/data");
//-->
</script>

...

<div spry:region="dsExample1a">
	<p>Value of the 'tvalue' column:</p>
	{tvalue}
	<p>Value of the 'cvalue' column:</p>
	{cvalue}
</div>

we end up with the following results:

Value of the 'tvalue' column:

{tvalue}

Value of the 'cvalue' column:

{cvalue}

In the example above, the value of the 'tvalue' column shows up as raw HTML markup within the Browser's content area, whereas the the value of the 'cvalue' column was actually interpreted as HTML markup and rendered accordingly. The reason this happens is because in prior versions of Spry, the XML Data Set would entity encode any string data found in a #text node before storing it in a column in the data set. The thought was that developers would be writing out raw plain text string values from their data base into the XML they produced, so if they had a string value of "X < Y and Y > Z" in their data base, they would write it out like this:

But since the XML parser decodes and stores the final results in its XML DOM as "X < Y and Y > Z", it meant that we would have to re-entity encode it to get the desired result when Spry regions were involved because the Spry region code simply inserts the values of a column directly into the HTML markup it builds up for a region. If it wasn't entity encoded, the "< Y and Y >" part of the value might be interpreted as a tag, and not appear properly.

So the thought was that we would store the string content underneath #text nodes and #cdata nodes in the XML Data Set *exactly* as they appeared in the XML document. This gave developers a way to get plain text values with special characters to render properly, and embed HTML within XML using CDATA.

This however confused some folks given the fact that in true XML, as in our XML example above, the string content should be identical and treated identically. To put an end to this confusion, and to make Spry more compatible with existing server side scripts/utilities that write out HTML embedded in XML, we've decided to introduce encode/decode modes within the XML Data Set. String content in both #text and #cdata nodes are treated identically, both entity encoded when stored in the data set, or both decoded when stored in the data set. Developers are able to select which mode (encoded/decoded) an XML Data Set uses.

The default mode entity encodes all #text and #cdata values when storing values in the XML Data Set. We have provided a 3rd mode that mimics the behavior from Spry Pre-Release 1.5 and earlier, with the hopes of eventual deprecation.

To turn on backwards compatibility for versions of Spry *after* Pre-Release 1.5, you simply have to pass a non-boolean value for the "entityEncodeStrings" constructor option:

<script src="../../Spry/includes/xpath.js" type="text/javascript"></script>
<script src="../../Spry/includes/SpryData.js" type="text/javascript"></script>
<script type="text/javascript">
<!--
var dsExample1a = new Spry.Data.XMLDataSet("../data/encode-01.xml", "/data", { entityEncodeStrings: -1 });
//-->
</script>

...

<div spry:region="dsExample1a">
	<p>Value of the 'tvalue' column:</p>
	{tvalue}
	<p>Value of the 'cvalue' column:</p>
	{cvalue}
</div>

The value for "entityEncodeStrings" should be either a boolean true or false. In the code example above, we are passing a -1 which is a "number" so it triggers the backwards compatibility mode. The example above could have easily used some random string value instead (example: { entityEncodeStrings: "backwards" }) to turn this mode on.

Example 2: Entity Encode String Values Mode (The Default Mode)

In versions of Spry *after* Pre-Release 1.5, the string data in #text nodes and #cdata nodes are stored exactly the same way within the data set. After the string values are parsed by the XML parser, the strings are then entity encoded and stored in the data set during the flattening process, so that if they are used within a spry:region, the string content renders within the browser.

To illustrate this point, loading this XML:

<?xml version="1.0" encoding="utf-8"?>
<data>
  <name>HTML Markup Entity Encoded Once</name>
  <tvalue>&lt;p&gt;Plain Text &lt;span style=&quot;font-weight: bold;&quot;&gt;Bold Text&lt;/span&gt; &lt;span style=&quot;color: red;&quot;&gt;Red Text&lt;/span&gt;&lt;/p&gt;</tvalue>
  <cvalue><![CDATA[<p>Plain Text <span style="font-weight: bold;">Bold Text</span> <span style="color: red;">Red Text</span></p>]]></cvalue>
</data>

into an XML data set and using a spry:region to display the text data:

<script src="../../Spry/includes/xpath.js" type="text/javascript"></script>
<script src="../../Spry/includes/SpryData.js" type="text/javascript"></script>
<script type="text/javascript">
<!--
var dsExample2a = new Spry.Data.XMLDataSet("../data/encode-01.xml", "/data");
//-->
</script>

...

<div spry:region="dsExample2a">
	<p>Value of the 'tvalue' column:</p>
	{tvalue}
	<p>Value of the 'cvalue' column:</p>
	{cvalue}
</div>

gives us the following results:

Value of the 'tvalue' column:

{tvalue}

Value of the 'cvalue' column:

{cvalue}

If the developer truly wants the text in a specific column to be interpreted as HTML, they can make use of the "html" column type:

<script src="../../Spry/includes/xpath.js" type="text/javascript"></script>
<script src="../../Spry/includes/SpryData.js" type="text/javascript"></script>
<script type="text/javascript">
<!--
var dsExample2b = new Spry.Data.XMLDataSet("../data/encode-01.xml", "/data");

dsExample2b.setColumnType("tvalue", "html");
//-->
</script>

...

<div spry:region="dsExample2b">
	<p>Value of the 'tvalue' column:</p>
	{tvalue}
	<p>Value of the 'cvalue' column:</p>
	{cvalue}
</div>

The example above sets the column type for the 'tvalue' column to "html", which gives us these results:

Value of the 'tvalue' column:

{tvalue}

Value of the 'cvalue' column:

{cvalue}

Since #text and #cdata nodes are treated identically, you can also set the column type for column values that were derived from #cdata nodes too, just as the 'cvalue' column was:

<script src="../../Spry/includes/xpath.js" type="text/javascript"></script>
<script src="../../Spry/includes/SpryData.js" type="text/javascript"></script>
<script type="text/javascript">
<!--
var dsExample2c = new Spry.Data.XMLDataSet("../data/encode-01.xml", "/data");

dsExample2c.setColumnType([ "tvalue", "cvalue" ], "html");
//-->
</script>

...

<div spry:region="dsExample2c">
	<p>Value of the 'tvalue' column:</p>
	{tvalue}
	<p>Value of the 'cvalue' column:</p>
	{cvalue}
</div>

In the code sample above, "cvalue" was added to the list of columns to set to "html". This gives us these results:

Value of the 'tvalue' column:

{tvalue}

Value of the 'cvalue' column:

{cvalue}

Example 3: Entity Decode String Values Mode

As mentioned in Example 2, by default, the strings in #text and #cdata nodes are stored in data set columns as entity encoded strings. A developer can change the "entityEncodeStrings" option, so that the strings are stored as entity decoded strings, instead, so that any HTML markup that is in the decoded string actually gets interpreted when the string is used in a spry:region.

If we load the same XML we used in Example 2:

<?xml version="1.0" encoding="utf-8"?>
<data>
  <name>HTML Markup Entity Encoded Once</name>
  <tvalue>&lt;p&gt;Plain Text &lt;span style=&quot;font-weight: bold;&quot;&gt;Bold Text&lt;/span&gt; &lt;span style=&quot;color: red;&quot;&gt;Red Text&lt;/span&gt;&lt;/p&gt;</tvalue>
  <cvalue><![CDATA[<p>Plain Text <span style="font-weight: bold;">Bold Text</span> <span style="color: red;">Red Text</span></p>]]></cvalue>
</data>

into an XML data set that specified "entityEncodeStrings:false" and used a spry:region to display the strings:

<script src="../../Spry/includes/xpath.js" type="text/javascript"></script>
<script src="../../Spry/includes/SpryData.js" type="text/javascript"></script>
<script type="text/javascript">
<!--
var dsExample3a = new Spry.Data.XMLDataSet("../data/encode-01.xml", "/data", { entityEncodeStrings: false });
//-->
</script>

...

<div spry:region="dsExample3a">
	<p>Value of the 'tvalue' column:</p>
	{tvalue}
	<p>Value of the 'cvalue' column:</p>
	{cvalue}
</div>

we would get the following results:

Value of the 'tvalue' column:

{tvalue}

Value of the 'cvalue' column:

{cvalue}

If a developer wanted a string to be displayed that used angle brackets or ampersands, those characters would have to be double entity encoded if embeded in the XML as text, or entity encoded once if embeded in CDATA. So for example, if we wanted some HTML markup to be shown, the XML would look like this:

<?xml version="1.0" encoding="utf-8"?>
<data>
	<name>HTML Markup Entity Encoded Once</name>
	<tvalue>&amp;lt;p&amp;gt;Plain Text &amp;lt;span style=&amp;quot;font-weight: bold;&amp;quot;&amp;gt;Bold Text&amp;lt;/span&amp;gt; &amp;lt;span style=&amp;quot;color: red;&amp;quot;&amp;gt;Red Text&amp;lt;/span&amp;gt;&amp;lt;/p&amp;gt;</tvalue>
	<cvalue><![CDATA[&lt;p&gt;Plain Text &lt;span style=&quot;font-weight: bold;&quot;&gt;Bold Text&lt;/span&gt; &lt;span style=&quot;color: red;&quot;&gt;Red Text&lt;/span&gt;&lt;/p&gt;]]></cvalue>
</data>

The code we would use would still look the same:

<script src="../../Spry/includes/xpath.js" type="text/javascript"></script>
<script src="../../Spry/includes/SpryData.js" type="text/javascript"></script>
<script type="text/javascript">
<!--
var dsExample3b = new Spry.Data.XMLDataSet("../data/encode-02.xml", "/data", { entityEncodeStrings: false });
//-->
</script>

...

<div spry:region="dsExample3b">
	<p>Value of the 'tvalue' column:</p>
	{tvalue}
	<p>Value of the 'cvalue' column:</p>
	{cvalue}
</div>

But it would render like this:

Value of the 'tvalue' column:

{tvalue}

Value of the 'cvalue' column:

{cvalue}