<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="/transform"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  <head>
    <title>Hephaestos' Curse</title>
    <base href="https://doriantaylor.com/"/>
    <link rel="meta" type="application/rdf+xml" title="FOAF" href="person/dorian-taylor?type=application/rdf+xml"/>
    <link rel="alternate index" title="This Site" href="this-site"/>
    <link rel="alternate bookmark" title="Elsewhere" href="elsewhere"/>
  </head>
  <body typeof="bibo:Article">
    <p>Pardon the inside baseball, but I'm actually pretty pissed about this.</p>
    <p>It turns out that the people behind Google Chrome intend to nuke an important but contentious piece of functionality for the Web. <abbr>XSLT</abbr>, which stands for Extensible Stylesheet Transformations, is a standard language for schlepping markup that has been part of Web browsers since the otherwise unremarkable Microsoft Internet Explorer 5.5. I know, because I can remember the room I was sitting in when I first started tinkering with it. In <em>two thousand and one</em>.</p>
    <p><abbr>XSLT</abbr> was a major part of my first major tech job. I used it to power a publication pipeline that I designed and implemented at my employer from 2002 through 2005. Using it, it was possible to manage over 120 websites in 15 different languages, with a team of four people (five including me). Its job was to take the raw semantic content and slap on the presentation layer, something <abbr>XSLT</abbr> excels at. Indeed, I've been using it this way on my own Web properties since 2007, to transform <abbr>(X)HTML</abbr> into itself. Having <abbr>XSLT</abbr> in the browser is especially convienent for knocking out <em>extremely</em> lazy static websites, because it does everything you'd want in a templating language, but built-in with no additional moving parts. In fact, I am inclined to say that it got just about everything important right.</p>
    <aside role="note">
      <p><em>Just</em> about. But the serious burrs were fixed in <abbr>XSLT</abbr> 2.0.</p>
    </aside>
    <p>Now, I say <abbr>XSLT</abbr> is <em>contentious</em> because it was undoubtedly roped into Web browsers during the <abbr>XML</abbr> fever of the <abbr>Y2K</abbr> era. If you're not familiar, <abbr>XML</abbr> is what I would describe as a <em>framework</em> for representing data structures as files or network messages, with a strong affinity toward document-shaped structures. The idea was, at the time, that <abbr>HTML</abbr>&#x2014;the language for representing Web pages&#x2014;would become <abbr><em>X</em>HTML</abbr>: <em>one</em> of many possible schemas, that could be trunked through a unified parsing infrastructure.</p>
    <aside role="note">
      <p>Say what you will about <abbr>XML</abbr>; I strongly believe it was a transformative moment for computing. It was an ordeal we all had to pass through so we could move on to better things. For one, I am pretty certain it's what got everybody finally speaking Unicode. Eventually, people determined that <abbr>XML</abbr> wasn't the panacaea it was forecast to be, and it has since receded to more specialized applications, while easier-to-manage formats like <abbr>JSON</abbr> have taken its place.</p>
    </aside>
    <p>The <em>problem</em> with <abbr>XML</abbr>&#x2014;aside from the fact that it is a pain in the ass to actually physically <em>type</em>&#x2014;is that it is extremely, needlessly strict. The parser has been specified in the standard to throw and unrecoverable error unless <em>everything</em> is perfect. It is on <em>you</em>, the author, to comply. Coming to <abbr>XML</abbr> from <abbr>HTML</abbr>, which will doggedly produce some value of <q>works</q> for all but the sloppiest handiwork, is nothing short of jarring. Developers come out of the womb hating it; the trauma is generational. <abbr>XSLT</abbr> is <abbr>XML</abbr>, and is meant to operate over <abbr>XML</abbr> (including <abbr>XHTML</abbr>), so you can imagine the kind of reception it gets among mainstream Web developers.</p>
    <p>So that's the backdrop. The precipitating event, ostensibly, is that earlier this year, a security researcher at Google Zero (i.e., same company) turned his attention to <abbr>XSLT</abbr> in the browser because of his personal proclivity for finding bugs in obscure, forgotten subsystems. Not a bad place to look! Unsurprisingly, what he found was a pile of zero-day&#x2731;.</p>
    <aside role="note">
      <p>&#x2731;&#x2009;For the uninitiated, that's tricks badguys can use to take over your computer, named after the fact that the maintainer of the software that contains the flaws has known about them for zero days, and thus has not had the opportunity to get rid of them.</p>
    </aside>
    <p>Now, it turns out that a bunch of these bugs happen to exist in a 25-year-old pair of software libraries called <code>libxml2</code> and <code>libxslt</code>. The raw age of this software isn't so much an issue, as it is that like many things open-source, it's a hobby project that woke up one morning and discovered that it was load-bearing. The other complicating matter is that due to <em>being</em> hobby software with no real resources behind it, <code>libxslt</code> only supports <abbr>XSLT</abbr> 1.0, which was standardized in 1999. These facts put together provide the basis for Google's argument to cut <abbr>XSLT</abbr> loose: it's old, it's full of serious security bugs (at least, the implementation of it we happen to use), and almost nothing uses it.</p>
    <p>The problem with this argument is that it's disingenuous. While both Chrome and Safari rely on this software, for which the security bugs <em>are</em> real and very serious, and for which its own maintainer considers unfit for purpose, the claim <q><abbr>XSLT</abbr> is old and nothing uses it</q> is misleading at best. For one, <abbr>XSLT</abbr> has vibrant, ongoing support in the publishing industry. Far from being abandoned, it has been updated to version <a href="https://www.w3.org/TR/xslt20/">2.0 in 2007</a> and <a href="https://www.w3.org/TR/xslt-30/">3.0 in 2017</a>, with the <a href="https://qt4cg.org/specifications/xslt-40/Overview.html">editor's draft of <abbr>XSLT</abbr> 4.0</a> shipping just last week. <a href="https://gitlab.gnome.org/balls/xrust">Not one</a>, <a href="https://github.com/Paligo/xee">but <em>two</em></a> new implementations have recently been authored in <a href="https://www.rust-lang.org/">the language Rust</a>&#x2731;. It's the <em>browsers</em> who haven't kept up.</p>
    <aside role="note">
      <p>&#x2731;&#x2009;This, at least preliminarily, makes either one a potential candidate to replace <code>libxslt</code>&#x2014;the first hurdle being that it compiles to machine code or <abbr>WASM</abbr>, but is memory-safe and doesn't require hauling in the Java-sphere.</p>
    </aside>
    <p><em>Why</em> haven't the browsers kept up with this specification in 25 years? I suspect the reason is simple enough: the people behind the essential standard that defines the Web&#x2014;that is, <abbr>HTML</abbr>&#x2014;hate anything to do with <abbr>XML</abbr> and want it to go away. Indeed, the entire reason why the <abbr>WHATWG</abbr> even <em>exists</em> in the first place is, among related things, a disagreement over the extent to which <abbr>XML</abbr> belongs on the Web.</p>
    <p>Unlike the <abbr>W3C</abbr> which has a much greater diversity of business entities represented in its membership&#x2014;as anybody who can afford to pay the dues can join&#x2014;the <abbr>WHATWG</abbr> <em>only</em> consists of developers of browser engines. There are currently four such entities in existence, and three of them are trillion-dollar corporations.</p>
  </body>
</html>
