1. 8
  1.  

  2. 1

    ZALGO RISES FROM THE DEPTHS!

    1. 1

      Alternatively, here is an equivalent XSL stylesheet that can be applied directly to a sitemap XML document:

      <?xml version="1.0" encoding="UTF-8"?>
      <xsl:stylesheet xmlns:sitemap="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xhtml="http://www.w3.org/1999/xhtml">
        <xsl:output encoding="UTF-8" method="text" version="1.0" />
        <xsl:template match="/"><xsl:apply-templates select="sitemap:urlset/sitemap:url/sitemap:loc" /></xsl:template>
        <xsl:template match="sitemap:loc"><xsl:value-of select="text()" /><xsl:text>
      </xsl:text></xsl:template>
      </xsl:stylesheet>
      

      This has the added benefit of not requiring the whitespace-processing step in the article, and could be easily modified to select elements based on attributes (e.g., language). However, it does require that the sitemap document actually be well-formed XML – the example in the article is not well-formed because it contains an unclosed <urlset> element. This is why XML parsers, validating parsers and transformers are so much more valuable than ad-hoc implementations. If the document does not conform to the specification, it will tell you.

      1. 2

        So how would I run this under a flavor of Linux?

        1. 2

          You can use xsltproc (pre-installed or available via a package manager with many Linux distributions):

          xsltproc sitemap-to-text.xsl sitemap.xml -o sitemap.txt
          

          You can also transform via the xsltc library, Java or C# DOM or one of many other XML DOM-compliant transformers if you need to integrate into an existing codebase.