<?xml version="1.0"?>
<?xml-stylesheet href="/transform" type="text/xsl"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:bs="http://purl.org/ontology/bibo/status/" xmlns:ci="https://vocab.methodandstructure.com/content-inventory#" xmlns:dct="http://purl.org/dc/terms/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xhv="http://www.w3.org/1999/xhtml/vocab#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" lang="en" prefix="bibo: http://purl.org/ontology/bibo/ bs: http://purl.org/ontology/bibo/status/ ci: https://vocab.methodandstructure.com/content-inventory# dct: http://purl.org/dc/terms/ foaf: http://xmlns.com/foaf/0.1/ rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# xhv: http://www.w3.org/1999/xhtml/vocab# xsd: http://www.w3.org/2001/XMLSchema#" vocab="http://www.w3.org/1999/xhtml/vocab#" xml:lang="en">
  <head>
    <title property="dct:title">The Symbol Management Problem</title>
    <base href="https://doriantaylor.com/the-symbol-management-problem"/>
    <link href="document-stats#ETIBXK-7KSAMiCyWxSUsZJ" rev="ci:document"/>
    <link href="elsewhere" rel="alternate bookmark" title="Elsewhere"/>
    <link href="this-site" rel="alternate index" title="This Site"/>
    <link href="http://purl.org/ontology/bibo/status/published" rel="bibo:status"/>
    <link href="" rel="ci:canonical" title="The Symbol Management Problem"/>
    <link href="person/dorian-taylor#me" rel="dct:creator" title="Dorian Taylor"/>
    <link href="//privatealpha.com/ontology/content-inventory/1" rel="dct:references"/>
    <link href="//privatealpha.com/ontology/ibis/1" rel="dct:references"/>
    <link href="person/dorian-taylor" rel="meta" title="Who I Am"/>
    <link about="./" href="3f36c30c-6096-454a-8a22-c062100ae41f" rel="alternate" type="application/atom+xml"/>
    <link about="./" href="f07f5044-01bc-472d-9079-9b07771b731c" rel="alternate" type="application/atom+xml"/>
    <link about="./" href="this-site" rel="alternate"/>
    <link about="./" href="elsewhere" rel="alternate"/>
    <link about="./" href="e341ca62-0387-4cea-b69a-cdabc7656871" rel="alternate" type="application/atom+xml"/>
    <link about="verso/" href="3f36c30c-6096-454a-8a22-c062100ae41f" rel="alternate" type="application/atom+xml"/>
    <link about="verso/" href="this-site" rel="alternate"/>
    <link about="verso/" href="elsewhere" rel="alternate"/>
    <meta content="Or: Why I (still) use Semantic Web technology." name="description" property="dct:abstract"/>
    <meta content="2019-11-04T22:33:36+00:00" datatype="xsd:dateTime" property="dct:created"/>
    <meta content="2019-11-28T02:12:32+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2022-05-31T15:10:50+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta about="person/dorian-taylor#me" content="Dorian Taylor" name="author" property="foaf:name"/>
    <meta content="summary_large_image" name="twitter:card"/>
    <meta content="@doriantaylor" name="twitter:site"/>
    <meta content="The Symbol Management Problem" name="twitter:title"/>
    <meta content="Or: Why I (still) use Semantic Web technology." name="twitter:description"/>
    <meta content="https://doriantaylor.com/file/notsof-indian-village" name="twitter:image"/>
    <object>
      <nav>
        <ul>
          <li>
            <a href="./" rev="dct:references" typeof="bibo:Website">
              <span property="dct:title">Make Things. Make Sense.</span>
            </a>
          </li>
          <li>
            <a href="programming-languages-i-have-known-and-loved-loathed-lulzed" rev="dct:references" typeof="bibo:Article">
              <span property="dct:title">Programming Languages I Have Known and Loved/Loathed/Lulzed</span>
            </a>
          </li>
          <li>
            <a href="//dorian.substack.com/p/radical-interoperability-is-a-political" rev="dct:references" typeof="bibo:Article">
              <span property="dct:title">Radical Interoperability is a Political Agenda</span>
            </a>
          </li>
          <li>
            <a href="//dorian.substack.com/p/setting-the-tone-for-an-anti-platform" rev="dct:references" typeof="bibo:Article">
              <span property="dct:title">Setting the Tone for an Anti-Platform</span>
            </a>
          </li>
          <li>
            <a href="document-stats#ETIBXK-7KSAMiCyWxSUsZJ" rev="ci:document" typeof="qb:Observation">
              <span>urn:uuid:4c80572b-eeca-4480-9322-0b25b1494b19</span>
            </a>
          </li>
        </ul>
      </nav>
    </object>
  </head>
  <body about="" id="EmZHXja47FptPGDHnkr1_J" typeof="bibo:Article">
    <p>I get asked sometimes why I cling so stubbornly to the <dfn>Semantic Web</dfn>. Before I answer this question, I have to deal with the fact that it is a totally loaded one.</p>
    <section id="EdE9HdkhJeJ0cDeDK9hw_J">
      <p>The phrase <q>semantic web</q>, loosely speaking, refers to an overlapping Venn diagram of three distinct concepts, and each of which can theoretically be considered without invoking the other two:</p>
      <ul>
        <li><a href="https://www.w3.org/TR/rdf11-concepts/" rel="dct:references"><abbr title="Resource Description Framework">RDF</abbr></a> is a <em>technical standard</em>,</li>
        <li><a href="https://en.wikipedia.org/wiki/Linked_data" rel="dct:references"><dfn>Linked Data</dfn></a> is an <em>architectural style</em>,</li>
        <li>The <em>actual</em> <a href="https://www.w3.org/standards/semanticweb/" rel="dct:references"><dfn>Semantic Web</dfn></a> is an <em>utopia</em>.</li>
      </ul>
      <p>So if somebody utters the phrase <q>semantic web</q> without further qualification&#x2014;especially to criticize it&#x2014;they're probably talking about the most audacious and easily straw-mannable of the three.</p>
      <p>There is a more technical interpretation of the term <dfn>Semantic Web</dfn>, which refers to the use of a <a href="https://en.wikipedia.org/wiki/Semantic_reasoner" rel="dct:references"><dfn>reasoner</dfn></a> to <em>infer</em> latent information from a set of existing assertions. Reasoners have plenty of practical applications, which I will discuss, but the precipice of the abyss into all manner of occult formal logic is marked by the use of a reasoner.</p>
      <aside role="note" id="EiKABW5nf-nN2bQaveztPL">
        <p>A side effect of spending too much time working with the <dfn>Semantic Web</dfn> proper is analytic philosophy.</p>
      </aside>
      <p>More mundane is <dfn>Linked Data</dfn>, which is simply the architectural constraint of giving machine-readable data objects <abbr title="Uniform Resource Locator">URLs</abbr> in order to make them directly accessible over the Web, and moreover that said machine-readable data objects <em>themselves</em> contain links to <em>other</em> data objects. This is a perfectly sensible pattern that we can see all over the place, particularly in <abbr title="Representational State Transfer">REST</abbr> <abbr title="Application Programming Interface">APIs</abbr>, and need not have anything to do with <abbr>RDF</abbr> or the <dfn>Semantic Web</dfn>.</p>
      <aside role="note" id="Ed82SQz_4xhTW4h6JcQIAJ">
        <p>That said, it is specifically <a href="https://www.ted.com/talks/tim_berners_lee_on_the_next_web?language=en" rel="dct:references">Tim Berners-Lee's vision</a> to facilitate the realization of the <dfn>Semantic Web</dfn> using <dfn>Linked Data</dfn> rendered in <abbr>RDF</abbr>.</p>
      </aside>
      <p>The central problem of <dfn>Linked Data</dfn> is that without some kind of protocol or appropriate context, <span class="parenthesis" title="programmatically">you can't necessarily tell</span>:</p>
      <ol type="a">
        <li>what <em>is</em> even a link, versus say, a text field that happens to contain a <abbr title="Uniform Resource Locator">URL</abbr>&#x2014;and that the <abbr>URL</abbr>-looking text actually <em>is</em> a <abbr>URL</abbr>,</li>
        <li>what the relation to the link&#x2014;or literal data member, for that matter&#x2014;<em>represents</em>.</li>
      </ol>
      <p>To remedy this situation, we need some kind of <dfn>schema</dfn>: For <span>(a)</span>, we need something that marks a piece of <em>syntax</em> as denoting <q>this thing here is <em>definitely</em> a link</q>. For <span>(b)</span>, we need something that specifies the <em>semantics</em> of the relation: what the thing <em>means</em>. <abbr title="Hypertext Markup Language">HTML</abbr> has a reasonable capability for the first, and an <em>extremely</em> limited vocabulary for the second. <abbr>HTML</abbr> likewise can't help you much if the data you want to represent is something other than a document. Cue the <abbr title="Extensible Markup Language">XML</abbr> boom of the early <time>2000s</time>. I can say from experience that whether you're writing a schema in <abbr title="Document Type Definition">DTD</abbr>, <dfn><abbr>XML</abbr> Schema</dfn>, or <abbr title="Regular Language for XML Next Generation">RELAX NG</abbr>, it is not a trivial undertaking. Other people clearly felt similarly, and we got things like <a href="http://microformats.org/" rel="dct:references"><dfn>Microformats</dfn></a>, <a href="https://html.spec.whatwg.org/multipage/microdata.html" rel="dct:references"><dfn>Microdata</dfn></a>, and most recently, <a href="https://json-schema.org/" rel="dct:references"><dfn><abbr title="JavaScript Object Notation">JSON</abbr> Schema</dfn></a>.</p>
      <aside role="note" id="Ey9nHlNrD5dIjRsjwwHI7L">
        <details>
          <summary><h2><abbr>XML</abbr> Apologia, Sort Of</h2></summary>
          <p>It is something of a trope in the Web development community to hate on <abbr>XML</abbr>. I nevertheless submit that the <abbr>XML</abbr> era was <em>necessary</em> to get the world acclimated to the notion that there could be a <em>common</em> data syntax, that in the worst-case scenario, you could read and write with an off-the-shelf plain text editor. Say what you will about <abbr>XML</abbr>: before it came along, <span class="parenthesis" title="ASN.1, S-expressions, etc.">the other candidates</span> were kind of all over the place&#x2014;and none of them dealt proactively with content encoding. If we didn't go through our <abbr>XML</abbr> phase, I doubt we would have standardized on <dfn>Unicode</dfn> <span class="parenthesis" title="which is not to say it was terribly smooth, but it could easily have been a lot less so">nearly as smoothly as we did</span>.</p>
          <p>Indeed, I suspect the revulsion toward <abbr>XML</abbr> is, at least among the people who were there for it, is consistent with a response to trauma: Everything about <abbr>XML</abbr> is pedantic. It will throw an error in your face at the slightest departure from its ultra-strict and doctrinaire specification, unlike say, <abbr title="Cascading Style Sheets">CSS</abbr> which ignores errors, and <abbr>HTML</abbr> which tries to correct them. How many billions of dollars have been lost due to an <abbr>XML</abbr> syntax error, I can't even begin to fathom. It is also physically hard to type: even with a souped-up code editor with inline validation and autocomplete, you still have to curl your fingers in an unnatural way to hit those angle brackets, one of many operations <span class="parenthesis before" title="quotation marks also get a workout">that require hitting the shift key</span> with your other hand.</p>
          <p>Finally, there's the brazen byzantine baroqueness of the whole thing. The first try of everything to do with <abbr>XML</abbr> always seemed to be insanely overcomplicated. <abbr>RELAX NG</abbr> was invented because <dfn>XML Schema</dfn> was too complicated. <abbr>SAX</abbr> was invented because <abbr>DOM</abbr> was too complicated. <abbr>SOAP</abbr> and <abbr>WSDL</abbr> and all that mess are completely insane and demonstrably unnecessary&#x2014;so much so that the Web development community more or less did away with <abbr>XML</abbr> entirely. Now it speaks primarily <abbr>JSON</abbr> and is happier for it. This explains the reactions of the <em>younger</em> developers: why would we put up with any of this crap if we didn't absolutely have to?</p>
          <p>The one item that puzzles me about this business though is the ostensible phobia toward <dfn>namespaces</dfn>. Namespaces are the mechanism by which <abbr>XML</abbr> does modularity, and the general consensus among developers is that modularity is <em>good</em>. So why hate namespaces?</p>
          <p>The only reason I can think of is that <abbr>XML</abbr> namespaces are necessarily <abbr>URIs</abbr>, and most of them don't do anything. Indeed, for the longest time, most of them didn't even <em>go</em> anywhere, and if they did, it would just say something like <q>This is the namespace for blahblahblah</q>. You were lucky if you got a link to human-readable spec and even luckier if you got a link to the machine-readable schema, which would have been elsewhere. Indeed, <a href="https://www.w3.org/2001/XMLSchema-instance" rel="dct:references">the mechanism for relating schemas to instance documents</a> is something <em>other</em> than the namespace <abbr>URI</abbr>. The practice of actually <span class="parenthesis" title="to say nothing of merging the machine-readable spec into the human-readable one">making the namespace <em>be</em> the spec</span> didn't catch on until well into the <abbr>RDF</abbr> era. This is bad enough from the position of an apologist. From an outsider perspective, this would have been totally nuts.</p>
          <p>So while I believe the sun has set on <abbr>XML</abbr> as the go-to framework for arbitrary data exchange, it is still a valuable tool to have in the kit. Use it when the pedantry is something you actually <em>want</em>. Generate it programmatically&#x2014;avoid writing it by hand. And for your own sanity, stay away from writing your own schema from scratch.</p>
        </details>
      </aside>
    </section>
    <section id="E9WsMgszoWDeBwQ2KC8_4I">
      <p>Developing alongside all of this business&#x2014;sometimes quietly and sometimes not so quietly&#x2014;is <abbr>RDF</abbr>. I first encountered it in <time>2006</time>, when it had been well underway for many years, but the perception, at least, was that it was still wedded to <abbr>XML</abbr>. Indeed, as late as <time>2013</time>, I got the opportunity to ask a preeminent taxonomist what they thought about <abbr>RDF</abbr>, only to get a pooh-pooh response: <q>some silly <abbr>XML</abbr> thing</q>.</p>
      <aside role="note" id="ELVrEJka-Q-DVX2vVuzMWL">
        <p>The <abbr>XML</abbr> syntax for <abbr>RDF</abbr> is actually really unnerving to the experienced <abbr>XML</abbr> schema designer, because you <em>can't</em> use the standard mechanisms to validate it. When I first encountered this, I thought it was a mistake in the design of <abbr>RDF</abbr>.</p>
      </aside>
      <p><abbr>RDF</abbr> is not an <abbr>XML</abbr> thing. What <abbr>RDF</abbr> <em>is</em>, is a <abbr>URI</abbr> thing. The <dfn>Resource Description Framework</dfn>, being a framework for describing <dfn>resources</dfn>, has to reference those resources somehow, so it naturally uses <dfn>Uniform Resource Identifiers</dfn>, and it uses them absolutely <em>everywhere</em>.</p>
      <p>This is the genius of <abbr>RDF</abbr>: Everything is a <abbr>URI</abbr>, except when it isn't. And if the <abbr>URI</abbr> in question is a dereferenceable <abbr>URL</abbr>, then what you automatically get is <dfn>Linked Data</dfn>. Slap a <dfn>reasoner</dfn> onto a big enough concentration of this material and you get the <dfn>Semantic Web</dfn>.</p>
      <p><dfn>Schemas</dfn>&#x2014;or what in the biz are called <dfn>vocabularies</dfn>&#x2731;&#x2014;the specifications that tell both you and the machine what means what, <a href="https://lov.linkeddata.es/" rel="dct:references">are as readily available as any other open-source software product</a>. Indeed, a number of de facto core vocabularies interact with and build off each other, as <span class="parenthesis" title="oh but it IS different in a very interesting and useful way">the inheritance model is not too different</span> from conventional object-oriented programming languages. If you can't find the one you need, <abbr>RDF</abbr> vocabularies are much easier to write than something like an <abbr>XML</abbr> vocabulary, because you're never defining <em>syntax</em>, only <em>semantics</em>. In other words, you only have to specify classes and properties, never sequences of elements.</p>
      <aside role="note" id="Ey9ZFy3PFfxfzwGlAHhmZL">
        <p>&#x2731;We tend to use the term <dfn>vocabulary</dfn> to be a catch-all for either <dfn><abbr>RDF</abbr> Schema</dfn> or <dfn><abbr>OWL</abbr> ontology</dfn>, the latter being a beefier version of the former.</p>
      </aside>
      <p>Speaking of syntax, this is taken care of for you. While it originated in <abbr>XML</abbr>, <abbr>RDF</abbr> has since grown a solid dozen alternative syntaxes, of chief interest are <a href="https://www.w3.org/TR/turtle/" rel="dct:references">the easily-typed <dfn>Turtle</dfn></a>, <span class="parenthesis" title="insofar as you can trick developers into using it without realizing that it's RDF">the stealthy <a href="https://www.w3.org/TR/json-ld/" rel="dct:references"><abbr title="JavaScript Object Notation for Linked Data">JSON-LD</abbr></a></span>, and <a href="https://www.w3.org/TR/rdfa-core/" rel="dct:references"><abbr>RDFa</abbr></a>, which embeds <abbr>RDF</abbr> data into other markup languages like <abbr>HTML</abbr>, <dfn>Atom</dfn>, or <abbr title="Scalable Vector Graphics">SVG</abbr>.</p>
    </section>
    <section id="ErJiB1yBcQUbTyLUhA0APK">
      <h2>Why I'm Still Here</h2>
      <p>Perhaps now, then, after several paragraphs, I will finally articulate <em>why</em> I use this technology. Something I have come to call the <dfn>Symbol Management Problem</dfn>:</p>
      <ul>
        <li>You have a quantity of <em>symbols</em>,</li>
        <li>Which you endeavour to <em>manage</em>, and</li>
        <li>This is a <em>problem</em>.</li>
      </ul>
      <p>In software, and especially in Web development, you will find yourself dealing with a number of <em>symbols</em>, <em>tokens</em>, <em>slugs</em>&#x2014;identifiers intended to pick out, demarcate, and differentiate different pieces of content for different kinds of processing:</p>
      <ul>
        <li>Package and/or class names</li>
        <li>Subroutine and/or method names</li>
        <li>Variable and constant names</li>
        <li>Coded/enumerated values (<a href="https://en.wikipedia.org/wiki/Dewey_Decimal_Classification" rel="dct:references">which actually predate software</a>)</li>
      </ul>
      <p>Pretty much everything has that. On the Web we also have:</p>
      <ul>
        <li><abbr>URL</abbr> path components</li>
        <li><abbr>URL</abbr> query keys</li>
        <li><abbr>URL</abbr>/<abbr>HTML</abbr> fragment identifiers</li>
        <li><abbr>HTML</abbr> form keys</li>
        <li><abbr>CSS</abbr> class names</li>
        <li><abbr>JSON</abbr> object keys</li>
        <li><code>data-*</code> attributes</li>
        <li>&#x2026;and three or four mutually-incompatible metadata schemes.</li>
      </ul>
      <aside role="note" id="ErAX3GeasH7_rxljsfzRHK">
        <p>I am only considering <q>leaf-node</q> application development here, but we could also easily consider element and attribute names, media type identifiers, <abbr>URI</abbr> scheme identifiers, the various identifiers that show up in different protocol headers, the names of the protocol headers themselves, etc.</p>
      </aside>
      <p>Web development is <em>particularly</em> rife with symbols, because at the end of the day, you're just schlepping text. A number of these symbols&#x2014;<abbr>CSS</abbr> class names and <abbr>HTML</abbr> IDs, <abbr>URL</abbr> query keys and form keys&#x2014;straddle multiple technical specifications because they are meant to serve as junctions that connect the different technologies together. On a more organizational level, many of these objects correspond to entities and relations in internal databases, classes, properties and methods in object-oriented code, or objects in legacy or third-party information systems. A significant chunk of the work of Web application development reduces to mapping these disparate objects to one another, usually in an ad-hoc way.</p>
      <p>The more symbol dictionaries you have to maintain&#x2014;assuming you maintain them at all&#x2014;the more overhead goes into maintaining them and/or dealing with the fallout of sub-par maintenance, and the more effort, and ultimately code, goes into translating between them. In other words, the <em>entropy</em> generated by the proliferation of symbols can actually foreclose on certain opportunities, because it simply becomes too costly to wrangle.</p>
      <p>The whole point of using human-readable symbols, and not, say, random strings or numbers, is to have a mnemonic or associative device such that a human being can look at a given symbol and <em>infer</em> to some extent what the thing is supposed to <em>mean</em>. The tendency, therefore, is to make them contain recognizable words. Here we can see how the <dfn>Symbol Management Problem</dfn> decomposes into two parts:</p>
      <dl>
        <dt>Redundancy</dt>
        <dd>When you have two or more terms that mean the same thing.</dd>
        <dt>Collision</dt>
        <dd>When you have the same term that means two or more things.</dd>
      </dl>
      <p>Both these situations arise when people, teams, organizations, etc. need a word for a distinct concept, and don't sufficiently consult with others in their orbit about what terms are already in use. This is a fundamental information-sharing problem that will occur any time it's easier to <em>make</em> something up than to <em>look</em> something up, and will persist to some degree no matter how good the communication gets. Nevertheless, it can be palliated.</p>
      <p>The collision problem is solved through <dfn>namespaces</dfn>, which, when fully-qualified <abbr>URIs</abbr>, are by design impossible to collide. The redundancy problem can be solved through <dfn>term reconciliation</dfn>, essentially denoting, in a machine-readable-form, that a certain term in one vocabulary means the same thing as a certain term in another. The general communication problem can be greatly ameliorated by making these terms, which are fully-qualified <abbr>URIs</abbr>, actually point to webpages containing their own dual machine/human-readable documentation. These can be published, indexed, and made discoverable. Indeed, in most cases, we can skip over the process of minting our own symbol vocabulary entirely and directly use vocabularies authored by other people.</p>
      <aside role="note" id="ENeVK5DPID02m57w3hfPOI">
        <p>Take somebody like <a href="http://www.heppnetz.de/" rel="dct:references">Martin Hepp</a>, whose PhD dissertation comprises <a href="http://www.heppnetz.de/projects/goodrelations/" rel="dct:references">an extensive e-commerce vocabulary</a> which is totally comprehensive and by all accounts <em>immense</em>. Why would you even try to roll your own when he has clearly put more time and effort into the necessary elements than you ever could justify?</p>
      </aside>
      <p>Symbol management is more important now in the age of <abbr>APIs</abbr>, when arbitrary data objects are continually being slung across administrative boundaries. The state of the art is that every website with an <abbr>API</abbr> also has a documentation section that tells the programmer which field means what, which fields are mandatory, which are optional, which fields are conditional on others, and what are the valid ranges of values for each field. The programmer then takes this information and writes an adapter, and this process is typically repeated&#x2014;in the <em>best</em>-case scenario&#x2014;<span class="parenthesis" title="In the worst case you do this for every application.">for every programming language that needs an interface.</span> If the programmer is tying together five <abbr>APIs</abbr>, they could easily be doing an ad-hoc five-way reconciliation of slightly different representations of, for example, a <em>user</em>. That seems like a huge waste of effort to me.</p>
      <p>The standard sales pitch for both the <dfn>Semantic Web</dfn> <em>and</em> <dfn>Linked Data</dfn> goes something like <q>you should use it because <a href="https://en.wikipedia.org/wiki/Metcalfe%27s_law" rel="dct:references">once everybody uses it, it will be awesome</a></q>. That appeal entails a <em>herculean</em> feat of human cooperation and skates over all sorts of vested interests. I submit instead that there needs to be a motivation to use this technology <em>even if nobody else in the world</em> subsequently adopted it, and I believe the <dfn>Symbol Management Problem</dfn> to be just that.</p>
    </section>
    <section id="EbkTOwuTqwjn-xawAo1k2I">
      <h2>So what have I actually made?</h2>
      <p>My job from about mid-<time>2002</time> to mid-<time>2005</time> involved designing, implementing, and running an <abbr>XML</abbr> content pipeline that eventually ended up managing about 120 websites in 15 languages, along with all the mailouts that could be julienned by a dozen different demographic parameters. I then went to work at one of the nascent <dfn>federated identity</dfn> providers where I did a lot of <abbr>API</abbr> and protocol work. By <time>2006</time> I had a pretty solid grasp of what <abbr>XML</abbr> was good for, and where it fell short.</p>
      <section id="E198rG0jtCygyKJxNJ6j1K">
        <h3>Early Experiments, Lofty Ambitions</h3>
        <p>A central theme in my work is to begin with a bulk quantity of raw material and apply successive structure-preserving transformations. By this point, I had already been working with the Web for a decade, and had by then noticed that most of the desired behaviour can be satisfied with only a handful of operations. If I could design a <em>substrate</em>, I figured, then one would only need to write custom code for the minority of behaviours that the substrate didn't already cover.</p>
        <aside role="note" id="EAHQeE92QyAuPqCX2_uoBI">
          <p>This was around the time that <abbr title="Model-View-Controller">MVC</abbr> frameworks such as <a href="https://rubyonrails.org/" rel="dct:references">Rails</a> started gaining popularity. The <abbr>MVC</abbr> paradigm is heaps better than <span class="parenthesis" title="i.e., PHP, ASP, JSP, SSI, Coldfusion">the callback model</span> that it supplanted, but I could happily write an entire monograph on the inadequacies of <abbr>MVC</abbr>, at least as it pertains to the Web.</p>
          <p>The only system I am aware of at the time to use the&#x2014;in my opinion&#x2014;more sophisticated pipeline paradigm was <a href="https://cocoon.apache.org/" rel="dct:references">Apache Cocoon</a>, which it had been doing as far back as <time>1999</time>. This is another subject that deserves its own treatment.</p>
        </aside>
        <p><q>Substrate</q>-like frameworks indeed already existed, albeit coupled almost always to <dfn>Java</dfn> and <em>always</em> always to <abbr>XML</abbr>. As I already implied, anything that requires you to scratch-write an XML vocabulary is a non-starter. As for Java, it's something of a Rubicon that a lot of Web developers&#x2014;myself included&#x2014;<span class="parenthesis" title="because once you get Java involved, your project turns from a scrappy little program into a whole Enterprise Software Product&#x2122;">would rather not cross</span>. My idea, after a close read of <a href="https://roy.gbiv.com/" rel="dct:references">Roy Fielding's</a> <a href="https://roy.gbiv.com/" rel="dct:references">PhD dissertation</a>, was to make a sort of <em>meta</em>-framework that could theoretically be implemented in <em>any</em> language, even mixed and matched between multiple systems.</p>
        <p>Instead of <abbr>XML</abbr>, the system would speak <abbr>RDF</abbr>, and even use <dfn>content negotiation</dfn> to select between syntaxes. This was actually a pretty solid plan except for the fact that unlike an <abbr>XML</abbr> document, any <abbr>RDF</abbr> serialization, at least at the time, was just a set of statements. There was no way to indicate an <q>initial subject</q>&#x2014;that is, connect the content you just downloaded from the location you just downloaded it from: it would be mixed in with all the other data and there would be no way to tell which <abbr>URI</abbr> was the <q>topmost</q> one. I put my master plan on hold and went in search of more tractable problems to solve.</p>
        <p>In <time>2007</time> I used <abbr>RDF</abbr> to record the results of a data analysis process. A packet of raw telemetry data would be injected into a pipeline of tests, whereby the outcome of one test may or may not cause the data to be subjected to subsequent tests. As such, the output was irregularly-shaped but still needed to be structured. It would have been incredibly difficult to pull off using <abbr title="Structured Query Language">SQL</abbr>. The process extracted subjects&#x2014;<abbr>URIs</abbr>&#x2014;from the packet along with facts about them which eventually built up a graph. I did this for an employer so it was never recognized as anything more than an experiment.</p>
      </section>
      <section id="ETQZ9Szwn52mQ-sjujzNUL">
        <h3><a href="content-robo-inventory" rel="dct:references" title="Content Robo-Inventory">Content Robo-Inventory</a></h3>
        <p>Around <time>2009</time>, <a href="betamaxed" rel="dct:references" title="Betamaxed">I wrote a wrapper around the Mercurial version control system</a> as a sort of first crack at an automated <dfn>content inventory</dfn>, that would scan the history of a <dfn>repository</dfn> to include things like modification dates and naming histories. This work eventually matured into a <a href="https://vocab.methodandstructure.com/content-inventory#">content inventory vocabulary</a>. The idea was to create a format for recording and exchanging Web content inventories&#x2014;which of course could be performed programmatically&#x2014;along with the outcomes of their subsequent audits. The <em>inventory</em> aspect is pretty mature by this point; the <em>audit</em> somewhat less so. This is still an active area of development that I believe has strong implications for the discipline of <dfn>content strategy</dfn>.</p>
      </section>
      <section id="Ei_rEztsODLIiGrT71pEgI">
        <h3>Structured Argumentation</h3>
        <p>Also in <abbr>2009</abbr>, I happened a chance encounter with <a href="http://www.youtube.com/watch?v=xQx-tuW9A4Q" rel="dct:references">a salon presentation featuring Douglas Engelbart</a>. In it, he rather nonchalantly tossed out a mention of a thing called <dfn>structured argumentation</dfn> which sounded a lot like it could serve as the basis for the <q>fitness variables</q> depicted in <a href="http://patternlanguage.com/" rel="dct:references">Christopher Alexander's</a> <a href="book/notes-on-the-synthesis-of-form" rel="dct:references" title="Notes on the Synthesis of Form">Notes on the Synthesis of Form</a>.</p>
        <figure id="EbFVWD-SXOUFbrtQHF8mzL">
          <img src="file/notsof-indian-village" alt="Minimalist graph of Indian village from Notes on the Synthesis of Form" rel="dct:hasPart foaf:depiction"/>
          <figcaption>
            <p>I had also modeled the Indian village in Appendix 2 of the book in order to try to <span class="parenthesis" title="I didn't get very far.">reconstruct Alexander's <abbr title="Hierarchical Decomposition of a Set">HIDECS</abbr> algorithm from its description in Appendix 1.</span></p>
          </figcaption>
        </figure>
        <p>Structured argumentation&#x2014;or at least the particular flavour of it that I had alighted on&#x2014;is a sort of organizational protocol of constraining rhetorical moves in order to do things like, as the authors put it, solve <a href="https://en.wikipedia.org/wiki/Wicked_problem" rel="dct:references"><dfn>wicked problems</dfn></a>. Alexander was also trying to solve complex problems: compute an architectural program&#x2014;that is to say a project plan&#x2014;through a topological analysis of the hairball of concerns, so these two ideas fit together quite naturally. The Engelbart connection is of course the use of an interactive hypermedia system to manipulate the thing.</p>
        <p>An <abbr>RDF</abbr> vocabulary for the strain of structured argumentation called <abbr title="Issue-Based Information System">IBIS</abbr>&#x2014;<a href="https://en.wikipedia.org/wiki/Issue-based_information_system" rel="dct:references"><dfn>Issue-Based Information System</dfn></a>&#x2014;had already been written, until one day it disappeared. So in <time>2012</time>, after vacillating for years, <a href="https://vocab.methodandstructure.com/ibis#">I decided to replace it</a>.</p>
      </section>
      <section id="Eupzq4cR1MqtA2-hKehSPI">
        <h3><abbr>RDF-KV</abbr> and the <abbr>IBIS</abbr> Tool</h3>
        <p>A year later, in <time>2013</time>, I was working on a project where I planned to use an <abbr>RDF</abbr> graph as the main database. The idea, hearkening back to my <q>substrate</q> plan, was that I could greatly abridge basic <abbr title="Create Retrieve Update Delete">CRUD</abbr> development by speaking what are effectively <abbr>RDF</abbr> <em>diffs</em>&#x2014;sets of statements to add or remove&#x2014;directly to the server. Thus I designed <a href="rdf-kv" rel="dct:references" title="RDF-KV">a protocol</a> and <a href="https://metacpan.org/pod/RDF::KV" rel="dct:references">reference implementation</a> I called <a href="rdf-kv" rel="dct:references" title="RDF-KV"><abbr>RDF</abbr>-<abbr title="key-value">KV</abbr></a>.</p>
        <p>The protocol works by embedding commands into the <em>keys</em> of <abbr>HTML</abbr> forms, such that the <em>values</em>, when supplied by the user, complete <abbr>RDF</abbr> statements, with a flag to indicate whether the statement should be added or removed. The protocol is dead simple by design, and can be implemented with <dfn>regular expressions</dfn>. The net effect is you can put a single catch-all <code>POST</code> handler on the server, and manipulate your <abbr>CRUD</abbr> behaviour just by changing your <abbr>HTML</abbr>.</p>
        <aside role="note" id="EZhliH-XyBhYP5c09z6UxJ">
          <p><dfn>Integrity constraint validation</dfn> is obviously a problem with this approach, but for that <a href="https://www.w3.org/TR/shacl/" rel="dct:references">we now have <abbr title="Shapes Constraint Language">SHACL</abbr></a>.</p>
        </aside>
        <p>Once I had the protocol, I needed to test it. I had yet to produce any vocabularies or instance data for the client, because part of the plan was that I would make a tool using the protocol to construct that data. I needed a complete vocabulary to write an app against, so I dragged my <abbr>IBIS</abbr> vocabulary out of mothballs and in a couple weeks, spent mostly fiddling with <abbr title="user interface">UI</abbr>, I had a reasonably serviceable structured argumentation tool. The original project for which I had designed the protocol eventually became a casualty of intraorganizational politics, but the <abbr>IBIS</abbr> prototype remains. <a href="https://ibis.makethingsmakesense.com/" rel="dct:references">Here it is</a>:</p>
        <figure id="EtBOj4jLXuMyAB79x1zi1J">
          <iframe style="display: block; width: 480px; height: 300px;" src="https://www.youtube.com/embed/TfIZY0s1JG0" allowfullscreen="" rel="dct:hasPart" frameborder="0"/>
          <figcaption>
            <p>This video demonstrates the <abbr>IBIS</abbr> process, as well as the general pattern of development for making tools of this type, by stepping through the rationale that brought itself into existence.</p>
          </figcaption>
        </figure>
        <p>The <abbr>IBIS</abbr> tool is a rather crude demonstration of a through-and-through <abbr>RDF</abbr> Web application. Graph statements that come in through the <abbr title="RDF-Key-Value">RDF-KV</abbr> protocol go directly into a <dfn>triple store</dfn>, and when they come back out, they are rendered as <abbr>RDFa</abbr>. I call the demonstration <em>crude</em> because it is incapable of handling arbitrary data objects&#x2014;that will have to wait for the inevitable rewrite. Nevertheless, we can see in this prototype a significant dent in the <dfn>Symbol Management Problem</dfn>.</p>
        <p>In particular, the tool demonstrates the use of embedded <abbr>RDFa</abbr> as <abbr>CSS</abbr> selectors: <abbr>RDFa</abbr> naturally identifies a subtree of an <span>(<abbr>X</abbr>)<abbr>HTML</abbr></span> document with a subject, and/or one or more predicates, and/or one or more classes or datatypes. This is almost always enough information to&#x2014;directly, through attribute selectors&#x2014;attach styling directives, and what affords the tool its wild palette.</p>
        <aside role="note" id="EzL7diL5I5Ptrw3-I36RTJ">
          <p>It is only <em>almost</em> always because <abbr>CSS</abbr> selectors are not as expressive as something like <abbr title="XML Path Language">XPath</abbr>, but there are ways around this limitation. Note also that any <abbr>CSS</abbr> using this technique will certainly have to be generated, e.g. using a macro processor like <abbr>SASS</abbr>, because it will be unmaintainable by hand.</p>
        </aside>
      </section>
      <section id="E6m2WKClGasDXXD7EcUJdL">
        <h3>The Intranet Project</h3>
        <p>A bare-bones <span>(<abbr>X</abbr>)<abbr>HTML</abbr></span>+<abbr>RDFa</abbr> document is at once an extremely well-defined development target as well as a terrifically versatile piece of raw material. When you write a piece of server-side code, you write it for consumption by downstream processes. You aren't creating a <em>page</em>, as much as a <em>patch</em> of the graph, originating at the request-<abbr>URI</abbr>, and featuring its immediate topological neighbours. The document's markup structure is heavily constrained by the statements you're trying to render, and for reasons aforementioned, there aren't a lot of other decisions to make about things like <abbr>CSS</abbr> class names and the like. When you're finished fashioning one of these resources&#x2014;or perhaps a function that generates them according to supplied parameters, it goes into the Lego pile where it can be consumed by and composed into other resources. I made an entire Web app this way.</p>
        <p>I have an ongoing project developing an intranet for a long-term client in the nonprofit sector. I add a little bit more to it at every conjunction of budget and availability, an arrangement they seem to be happy with. Indeed, it's part of the reason why I came up with the pattern: they don't have&#x2014;or at least wouldn't be prudent to spend&#x2014;the resources to develop software the conventional way, and I need a simple design that won't go obsolete between when I put it down and when I pick it back up again.</p>
        <p>The project mainly consists of a set of tools for comprehending a whack of <abbr title="human resources">HR</abbr> data: lists, charts, and individual members. The former two share a control panel the client uses to filter the data. The control panel is constructed from a repurposed <abbr title="Web Ontology Language">OWL</abbr> ontology and <abbr title="Simple Knowledge Organization System">SKOS</abbr> concept scheme that together describe all the idiosyncratic terms and coded properties peculiar to the organization. The chart generator takes <abbr>HTML</abbr> tables with an embedded <dfn>Data Cube</dfn> structure which it uses to negotiate the appropriate transformation into <abbr>SVG</abbr>.</p>
        <p>To reiterate, the technical innovations of this project are borne mainly out of resource constraints. It is the way it is because it would be too much of a mess for a single person to manage otherwise. And as much as I would love to show this thing off, it's an intranet that browses through and visualizes reams of confidential personal information, so you don't get to see it. I'll have to show you something else.</p>
      </section>
      <section id="EwzQDv86OMulJuq0Oy3syK">
        <h3>The Swiss-Army Knife</h3>
        <p><a href="./" rel="dct:references" title="Make Things. Make Sense.">My personal website</a> is not only a place to write, but also a fairly large body of content that can't refuse my Frankenstein experiments. Historically it has not been very sophisticated because I am spectacularly lazy, though as a byproduct of this laziness I stumbled across a useful technique: Every Web browser going back to Microsoft Internet Explorer 5.5 has an embedded <abbr>XSLT</abbr> 1.0 transformation engine. <abbr>XSLT</abbr> is not the slightest bit picky about the markup it consumes, and will happily <span class="parenthesis" title="Well, the input has to be XHTML but the output can be HTML.">transform <span>(<abbr>X</abbr>)<abbr>HTML</abbr></span> into itself.</span> It therefore makes a perfectly good, fast, and perfectly <em>lazy</em> page composition and template processor.</p>
        <p>Most websites tend to have ancillary content that is repeated on every page, and so does mine. Going back as far as <time>2007</time> or <time>2008</time>, I solved this by putting the ancillary content on its own page, and then putting a <code>&lt;link&gt;</code> to it in the <code>&lt;head&gt;</code> of each document, using <abbr>HTML's</abbr> limited set of semantic relations to disambiguate those links from any others. I would then <dfn>transclude</dfn> the links using the <code>document()</code> function in <abbr>XSLT</abbr>.</p>
        <p>The problem with this approach is that both the method of resolving the links <em>and</em> that of how to insert them into the document are brittle and ad-hoc. Since I was using the technique on the aforementioned intranet project and the little extranets I make to share materials with my clients, I felt it was important to generalize it. I made two <abbr>XSLT</abbr> libraries: <a href="https://github.com/doriantaylor/rdfa-xslt" rel="dct:references">one to query an <abbr>RDFa</abbr> document</a>, and <a href="https://github.com/doriantaylor/xslt-transclusion" rel="dct:references">another to do the transclusion</a>.</p>
        <aside role="note" id="ELiIe-a3WMfCMCdiQs5WmI">
          <p>These libraries are not perfect but they are pretty good. The query engine can pull any statement out of a document, and the transclusion engine will do the right thing when it comes to headers and subsections, and not getting stuck in loops. Limitations include not being able to take arbitary input: they both assume a certain level of hygiene for the documents going in, which ultimately means you can't just go pulling documents together willy-nilly from all over the internet.</p>
        </aside>
        <p>These libraries worked great for my other projects, but my own site itself is just plain handwritten <abbr>XHTML</abbr>. The content inventory metadata I mentioned: I needed a sustainable way to reinject that data back into the markup. <a href="https://github.com/doriantaylor/rb-rdf-sak" rel="dct:references">And so became <code>RDF::SAK</code></a>.</p>
        <p>The Swiss Army Knife is a library whose purpose is mainly to act as a breadboard prototype for an agglomerate of desirable operations. For the moment it handles weaving <abbr>RDF</abbr> data back into plain Web pages, mapping resources from durable <abbr>URIs</abbr> to more evanescent Web <abbr>URLs</abbr> and handling their naming histories, generating <dfn>Atom</dfn> feeds and various indexes, and a few other mundane chores. It currently takes the form of a static website generator. Its proximate goal is to generate output I can use to make websites&#x2014;beginning with my own&#x2014;more <em>hypertext-y</em>.</p>
        <aside role="note" id="EexqlHGOPSkVwvoU-K_LXK">
          <p>I suppose <code>RDF::SAK</code> is actually a bona fide <dfn>Semantic Web</dfn> application, on account of it using a <dfn>reasoner</dfn>, albeit for the most boring and quotidian purposes, like determining which resources in the graph are documents, and partitioning <dfn>Atom</dfn> feeds by audience.</p>
        </aside>
      </section>
    </section>
    <section id="EF3oml2uGitg4YxLnVB3eI">
      <h2>Coda</h2>
      <figure id="EKdJhY11MSFsv1XrAdzYYL">
        <iframe style="display: block; width: 480px; height: 270px;" src="https://www.youtube.com/embed/eV84dXJUvY8" frameborder="0" allowfullscreen="" rel="dct:hasPart"/>
        <figcaption>
          <p>Where is my hypertext utopia?</p>
        </figcaption>
      </figure>
      <p>As somebody who writes a lot for work and reads for it even more, I am growing increasingly dissatisfied with the <em>sparsity</em>&#x2014;the <em>clunkiness</em>&#x2014;of digital text. Technical manuals are cluttered with preamble and exposition, while their jargon glossaries, if they exist, are tucked out of sight. News articles still don't let you pivot by person, organization, or macro-event&#x2014;social network analyses and multi-story timelines only seem to appear as special features. Academic papers still require you to dig out their references by hand. It's often easier to write a passage a second time than it is to locate where you had written it previously&#x2014;and even if you did, your only option is to duplicate it, rather than just reference the original in-line. Documents are continually lapsing out of date with no connection to the most recent version. Quantitative arguments are still heavily rhetorically leveraged when they could simply be <a href="http://worrydream.com/ClimateChange/#media" rel="dct:references">interactively demonstrated in situ</a>.</p>
      <p>I consider myself to be in the <em>comprehension</em> business. My professional objective is to remove the obstacles that slow down the uptake of knowledge. To remove the obstacles to knowledge, we must increase the paths through information. The more paths&#x2014;<em>links</em>&#x2014;the more complexity. That complexity needs to be managed, and for this job I have yet to encounter a more effective candidate.</p>
    </section>
  </body>
</html>
