Chaotic PatternExclusively Symphony, straight from the source

Manipulating HTML in XML

Content is king and sometimes the king does not want to be told what to do. What do we do? Revolt.

Copy-of instruction’s Achilles heel

The problem with <xsl:copy> and <xsl:copy-of> is that it’s useful only if you don’t need to modify its content. Let’s take a look at a sample XML that will be used throughout this article:

<body>
    <h3 id="tips">Ninja 101</h3>
    <p>Ninjas are <em>not</em> about killing, it's about devotion.</p>
    <p>You will do well to heed to the following. Learn to:</p>
    <ul class="skills">
        <li>Conceal</li>
        <li>Strafe</li>
        <li><a href="#tango">Tango</a></li>
    </ul>
    <p>Only a true ninja can summon the courage to sing karaoke in public.</p>
</body>

Using <xsl:copy-of select="body/*"/> will display the above pixel-to-pixel to your output sans the <body> node but that leaves you with no room to modify the source. We need a fresh approach to this problem.

Something more applicable

Here are a few things that we know we must do:

With that in mind, let’s set out to produce code alpha.

Firstly we need to change from using <xsl:copy-of/> to the new and improved way: <xsl:apply-templates select="body" />. This will begin the apply process starting from <body>. The corresponding matching code for it would be:

<xsl:template match="body/*">
    <xsl:element name="{name()}"/>
</xsl:template>

This rule says, match all children elements of <body>. The element instruction is used to reproduce the element in context. This is what the output looks like:

<h3/>
<p/>
<p/>
<ul/>
<p/>

Well, it’s a good start. So far we’ve managed to output all the top level elements. Next step is to grab the text elements in the each of the nodes in code beta:

<xsl:template match="body/*">
    <xsl:element name="{name()}">
        <xsl:apply-templates/>
    </xsl:element>
</xsl:template>

Alternatively one can use <xsl:value-of select="."/>. In the above code, the use of <xsl:apply-templates/> also works because in the event that there are no matching nodes found, the built-in template rule will be applied. Built-in template rules simply copy the node and outputs it as plain text. Conveniently, the non-matched nodes in our case happens to be text nodes.

More importantly, <xsl:apply-templates/> serves a much greater purpose and this will be evident soon. But before that, let’s take a look at the result:

<h3>Ninja 101</h3>
<p>Ninjas are not about killing, it's about devotion.</p>
<p>You will do well to heed to the following. Learn to:</p>
<ul>
    Conceal
    Strafe
    Tango
</ul>
<p>Only a true ninja can summon the courage to sing karaoke in public.</p>

It’s now looking pretty good. Next up, we will tackle elements embedded inside another element. We will need to figure out a way so the template will be perpetually applied. Luckily, it was good foresight that we used <xsl:apply-templates/> didn’t we? Let’s take a look at code gamma.

<xsl:template match="body//*">
    <xsl:element name="{name()}">
        <xsl:apply-templates/>
    </xsl:element>
</xsl:template>

The changed line here is body//*. The <xsl:apply-templates/> instruction is already doing the hard work for us, all we needed to do was to change the template’s Pattern1 to encompass all children nodes of <body>. The sizzling result is as shown:

<h3>Ninja 101</h3>
<p>Ninjas are <em>not</em> about killing, it's about devotion.</p>
<p>You will do well to heed to the following. Learn to:</p>
<ul>
    <li>Conceal</li>
    <li>Strafe</li>
    <li><a>Tango</a></li>
</ul>
<p>Only a true ninja can summon the courage to sing karaoke in public.</p>

As the ninjas would say, “Splendid, sir. Would you like more wine?” If you’re not excited by now then you must be a robot. A cold, evil robot – and robots can’t be ninjas.

The only things missing are attributes. Here’s code delta:

<xsl:template match="body//*">
    <xsl:element name="{name()}">
        <xsl:apply-templates select="@*"/>
        <xsl:apply-templates/>
    </xsl:element>
</xsl:template>

<xsl:template match="body//@*">
    <xsl:attribute name="{name(.)}">
        <xsl:value-of select="."/>
    </xsl:attribute>
</xsl:template>

We’ve added a new line in the first template: <apply-templates select="@*"/>. The same logic for multiple nested elements also applies to attributes so it’s important to use apply-templates in this case to match all the attribute nodes.

Here’s the final result:

<body>
    <h3 id="tips">Ninja 101</h3>
    <p>Ninjas are <em>not</em> about killing, it's about devotion.</p>
    <p>You will do well to heed to the following. Learn to:</p>
    <ul class="skills">
        <li>Conceal</li>
        <li>Strafe</li>
        <li><a href="#tango">Tango</a></li>
    </ul>
    <p>Only a true ninja can summon the courage to sing karaoke in public.</p>
</body>

More elegant XSLT

The code now works great but the apply template rule can be abbreviated:

<xsl:template match="body//*">
    <xsl:element name="{name()}">
        <xsl:apply-templates select="* | @* | text()"/>
    </xsl:element>
</xsl:template>

<xsl:template match="body//@*">
    <xsl:attribute name="{name(.)}">
        <xsl:value-of select="."/>
    </xsl:attribute>
</xsl:template>

Through the power of the union (|) operator, we can combine the instructions into a single one.

So…great, we’ve managed to achieve exactly what <xsl:copy-of/> could’ve done in one line. Yawnfest I hear people say. Fear not, the good bit is coming.

Exercise for the body and mind

Brain teaser time. Let’s give ourselves some ninja-worthy exercises (with increasing difficulty for extra ninjaness):

  1. Change <h3> to <h4>.
  2. Output only elements before the second <p> element.
  3. Only display one <li> element.

The first one sounds easy enough, the second and third could be a little trickier.

Exercise 1: Modifying the heading

The beauty of template matching is that you can always override it. templates have an attribute called “priority” which you can set so a template matching the same element can take precedence over another. It’s kind of like using the !important rule in CSS.

<xsl:template match="h3" priority="1">
    <xsl:element name="h4">
        <xsl:apply-templates select="* | @* | text()"/>
    </xsl:element>
</xsl:template>

As you can see this is conceptually the same as the rule we used to match all elements inside <body>, except the match is more specific and a priority value of 1 is given (the default is 0). And that’s it! This additional template was the only thing needed to change the source <h3> into <h4>!

I won’t spoil the fun for you all so I’m going to allow you guys to mull over rest of the exercises. Truth be told, I haven’t done them myself so any of you who figured it out before I do, please share it with us! If no one manages to solve them, then I’ll post up a solution in a future article.

Now that you know how to display your body content the ninja way, I encourage everyone who’s read this article to assassinate <xsl:copy-of/> and embrace the shuriken that is <apply-templates/>.

Discussion

Have you managed to solve ninja exercises one and two? Are you a Japanese-Indian Ninja Guru? Why not share your findings, eureka moments, despair or frustration with us?

Footnotes

  1. It’s important to note that Patterns are different to XPath. Not all XPath syntax is allowed in a Pattern.

Almost-guaranteed discussion of intrigue

SpideyMizzou 14 June 07

OK! I’ve got the third exercise, but I’m having serious issues figuring out the second…perhaps I’m just going about it all the wrong way…


<xsl:template match="body//*">
    <xsl:element name="{name()}">
      <xsl:apply-templates select="* | text() | @*"/>
    </xsl:element>
</xsl:template>

<xsl:template match="body//@*">
    <xsl:attribute name="{name(.)}">
        <xsl:value-of select="."/>
    </xsl:attribute>
</xsl:template>

<xsl:template match="li[position() != 1]" />

Perhaps this is a pretty simple solution? Let me know! Great site. It’s nice to find good resources on XSLT out here on the Interwebs. Keep it up!

Josh 14 June 07Wordfest participantWordfest word awardWordfest Starting Phrase award

I want to see SpideyMizzou get his comment formatted correctly before his endangered ego recedes.

Josh 14 June 07

“I want” has a 4 letter word, dammit! ;-P

Justin 6 December 07

Well, I cant agree more.

Nick Toye 2 May 08

Noticed that when you run this code initially:

<p><xsl:template match="body/*">
    <xsl:element name="{name()}"/>
</xsl:template></p>

It displays this:

<p></p>
<p></p>
<ul></ul>
<p></p>

I am using Sabletron in TestXSLT.

On your example it shows that the tags are self closing, what processor are you using and why is this happening?

Nick Toye 3 May 08

Do we have a solution for part 2?

Allen Chang 12 May 08

The solution for part 2 is actually very complex. Here it is:

Instead of just applying the body element as you would usually do:

<xsl:apply-templates select="body"/>

you will need to add some constraints to what you want to apply:

<xsl:apply-templates select="body/*[not(preceding::p[preceding::p])][position() &lt; last()]"/>

Emit your deadly intellect rays