<?xml version="1.0"?>
<?xml-stylesheet href="/transform" type="text/xsl"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:bs="http://purl.org/ontology/bibo/status/" xmlns:ci="https://vocab.methodandstructure.com/content-inventory#" xmlns:dct="http://purl.org/dc/terms/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xhv="http://www.w3.org/1999/xhtml/vocab#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" lang="en" prefix="bibo: http://purl.org/ontology/bibo/ bs: http://purl.org/ontology/bibo/status/ ci: https://vocab.methodandstructure.com/content-inventory# dct: http://purl.org/dc/terms/ foaf: http://xmlns.com/foaf/0.1/ rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# xhv: http://www.w3.org/1999/xhtml/vocab# xsd: http://www.w3.org/2001/XMLSchema#" vocab="http://www.w3.org/1999/xhtml/vocab#" xml:lang="en">
  <head>
    <title property="dct:title">Content Management Meta-System</title>
    <base href="https://doriantaylor.com/content-management-meta-system"/>
    <link href="document-stats#EYg8RarQ-DHplqHcNs_MzL" rev="ci:document"/>
    <link href="elsewhere" rel="alternate bookmark" title="Elsewhere"/>
    <link href="this-site" rel="alternate index" title="This Site"/>
    <link href="http://purl.org/ontology/bibo/status/published" rel="bibo:status"/>
    <link href="" rel="ci:canonical" title="Content Management Meta-System"/>
    <link href="person/dorian-taylor#me" rel="dct:creator" title="Dorian Taylor"/>
    <link href="//www.youtube.com/embed/eV84dXJUvY8?rel=0" rel="dct:hasPart"/>
    <link href="file/inventory-visualized" rel="dct:hasPart"/>
    <link href="//privatealpha.com/ontology/content-inventory/1" rel="dct:references"/>
    <link href="file/box-whisker-test;invert" rel="foaf:depiction"/>
    <link href="person/dorian-taylor" rel="meta" title="Who I Am"/>
    <link about="./" href="3f36c30c-6096-454a-8a22-c062100ae41f" rel="alternate" type="application/atom+xml"/>
    <link about="./" href="f07f5044-01bc-472d-9079-9b07771b731c" rel="alternate" type="application/atom+xml"/>
    <link about="./" href="this-site" rel="alternate"/>
    <link about="./" href="elsewhere" rel="alternate"/>
    <link about="./" href="e341ca62-0387-4cea-b69a-cdabc7656871" rel="alternate" type="application/atom+xml"/>
    <link about="verso/" href="3f36c30c-6096-454a-8a22-c062100ae41f" rel="alternate" type="application/atom+xml"/>
    <link about="verso/" href="this-site" rel="alternate"/>
    <link about="verso/" href="elsewhere" rel="alternate"/>
    <meta content="A chronicle of a decade-long odyssey into content management, which seems to have finally gotten somewhere." name="description" property="dct:abstract"/>
    <meta content="2019-04-18T01:04:55+00:00" datatype="xsd:dateTime" property="dct:created"/>
    <meta content="2019-04-18T01:23:06+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2019-04-18T01:24:29+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2019-04-20T14:56:55+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2021-02-18T02:24:00+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2021-03-17T03:00:30+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2022-05-31T15:10:50+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta about="person/dorian-taylor#me" content="Dorian Taylor" name="author" property="foaf:name"/>
    <meta content="summary_large_image" name="twitter:card"/>
    <meta content="@doriantaylor" name="twitter:site"/>
    <meta content="Content Management Meta-System" name="twitter:title"/>
    <meta content="A chronicle of a decade-long odyssey into content management, which seems to have finally gotten somewhere." name="twitter:description"/>
    <meta content="https://doriantaylor.com/file/box-whisker-test;invert" name="twitter:image"/>
    <object>
      <nav>
        <ul>
          <li>
            <a href="//dorian.substack.com/p/back-just-in-time-to-close-out-the" rev="dct:references" typeof="bibo:Article">
              <span property="dct:title">Back Just In Time To Close Out the Year</span>
            </a>
          </li>
          <li>
            <a href="./" rev="dct:references" typeof="bibo:Website">
              <span property="dct:title">Make Things. Make Sense.</span>
            </a>
          </li>
          <li>
            <a href="//dorian.substack.com/p/setting-the-tone-for-an-anti-platform" rev="dct:references" typeof="bibo:Article">
              <span property="dct:title">Setting the Tone for an Anti-Platform</span>
            </a>
          </li>
          <li>
            <a href="website-change-diary" rev="dct:references" typeof="bibo:Article">
              <span property="dct:title">Website Change Diary</span>
            </a>
          </li>
          <li>
            <a href="document-stats#EYg8RarQ-DHplqHcNs_MzL" rev="ci:document" typeof="qb:Observation">
              <span>urn:uuid:620f116a-b43e-40c7-ba65-a8770db3f333</span>
            </a>
          </li>
        </ul>
      </nav>
    </object>
  </head>
  <body about="" id="E6PYVh7tW5cfdKVS3aoS5L" typeof="bibo:Article">
    <p>Back in 2008 I decided to put up a website again, after not having one for something like 6 years. This was going to be super budget, just an interim static site until I could circle back around with the full-fledged <abbr>CMS</abbr> later.</p>
    <aside id="ERG1RiBKkU3-giLzicCqsI" role="note">
      <p>This is around the time I had had something of an epiphany: <a href="colophon" title="Colophon" rel="dct:references">I could use a particular technique</a>&#x2014;utterly antique by the tastes of the Web tech industry&#x2014;in an embarrasingly obvious-in-retrospect way, to do ultra-easy and lazy separation of presentation and content. In fact I'm <em>still</em> using, <a href="https://github.com/doriantaylor/xslt-transclusion" rel="dct:references">and even extending it</a>. It's so effective, and there are so many fringe benefits, you would have a hard time convincing me to use anything else.</p>
      
    </aside>
    <section id="ETWcjXMdD5Mr2v6o_2DowI">
      <h2>Design Constraints</h2>
    <p>I had two constraints in mind when I got started this time:</p>
    <ol>
      <li><dfn>Dense Hypermedia</dfn>: I wanted to make the experience <em>linky</em>. I wanted the content of any one document to fit on <span class="parenthesis" title="note that the iPhone would have been less than a year old at the time, though if a formatted document fit vertically on a standard laptop screen, it would probably fit on an iPhone-era device too">a standard laptop screen</span>, and I wanted it to <em>link</em> if it needed more space than that. The idea was to replicate old-school hypertext: whether the goal was didactic, argumentative, or narrative, I wanted the reader to be able to choose what they read next.</li>
      <li><a href="https://www.w3.org/Provider/Style/URI" rel="dct:references">Cool <abbr>URIs</abbr> Don't Change</a>: This was at least as much of a technical challenge as it was a stylistic one. The idea was that if you mint a Web <abbr>URI</abbr>&#x2014;I'm talking about the actual <em>string</em>, not the document it points to&#x2014;you lose control of it somewhat. Other people, companies, machines, services become aware of it, and they use it to return and fetch the resource&#x2014;or at least <em>some</em> resource&#x2014;identified by it. <span class="parenthesis" title="it is still bad in 2019">The state of <abbr>URI</abbr> preservation in <time>2008</time> was bad</span>, and I wanted to see if I could do something about it. Put more generically, I wanted to have a website where no legitimate user would <em>ever</em> encounter a <code>404</code>.</li>
    </ol>
    <p>I bailed on the first constraint pretty early on. I found very quickly that writing hypertext is <em>hard</em>: the scope has a tendency to explode to what I estimate to be proportional to the square of the amount of writing initially expected. Publication would be constipated waiting for some parenthetical Pandora's box or other to be wrangled to satisfaction. I found this could be palliated somewhat by publishing subgraphs that only linked to each other or documents that had already been published, but the no-<code>404</code> stipulation meant that I was on the hook for an increasingly unmanageable hairball until all paths through it had terminated. I wanted links in the documents to remind myself that there was something there to expand on, but I didn't want those links in the hands of users, and I <em>certainly</em> didn't want them in the databases of indexers either, at least not until there was something on the other end.</p>
    <p>Without adequate instrumentation, <span class="parenthesis" title="though the fight isn't over yet; I will eventually try again">writing dense hypertext turned out to be just too hard</span>. Within a year I had reverted to writing essays.</p>
    <p>The second constraint&#x2014;unbreakable <abbr>URIs</abbr>&#x2014;turned out to be easier to maintain. As a byproduct of my first tech job in <time>1999-2000</time>, I had gotten familiar with <a href="http://perl.apache.org/" rel="dct:references"><code>mod_perl</code></a>, which gives you full access to the guts of Apache without wasting your life writing <dfn>C</dfn>. Working at that level meant your app shipped as a unitary module, bypassing the clunkery of contemporaneous Web application development techniques like <abbr>CGI</abbr> scripts, or code-interpolated documents such as <abbr>PHP</abbr>. This taught me an important lesson: what is called the <dfn>Request-<abbr>URI</abbr></dfn>, the combination of the <code>/path</code> and <code>?query</code>, the part between the <code>://host</code> and the <code>#fragment</code>, from the point of view of <a href="https://tools.ietf.org/html/rfc7230#section-5.3.1" rel="dct:references">the standard</a> <em>and</em> the Web server, may as well be a flat dictionary key. It is only by <em>convention</em> that it represent some location on the server's file system. If you can get into this pipeline early enough, you can make the <dfn>Request-<abbr>URI</abbr></dfn> represent whatever you want.</p>
    <p>Put another way: <code>/path/hierarchies/are/not/necessary</code>. The only thing that matters&#x2014;to the server&#x2014;is that the <dfn>Request-<abbr>URI</abbr></dfn> unambiguously picks out a resource. The <dfn>slug</dfn> is easy enough: just do a sensible transformation of the title. Throw away the idea of <q>sections</q> and plunk everything in the root. If anything threatens a collision, <em>that's</em> when you start adding <code>/path/segments</code>. And when it does, just put in a redirect.</p>
    <aside role="note" id="En8_GFu3vA8yWPxjcjHbHJ">
      <p>I imagine namespace collisions would happen more often on some sites than others. In over a decade and hundreds of addresses it's only happened a couple times on mine.</p>
    </aside>
    </section>
    <section id="E97cb0aKz1A4x5q09BzWfI">
      <h2>The Semantic Connection</h2>
    <p>Around this time is also when I was really ramping up my work with <abbr>RDF</abbr>, the lingua franca of the <dfn>Semantic Web</dfn>. What you find very quickly when you start working with <abbr>RDF</abbr> is that it is <em>ravenously</em> hungry for <abbr>URIs</abbr>. Combined with the notion that <abbr>HTTP</abbr>(<abbr>S</abbr>) <abbr>URIs</abbr> ought to <em>point somewhere</em>, this is quickly escalates into a microcontent curation nightmare.</p>
    <aside role="note" id="ElcIIOBOS7iFZ3CvUNneYJ">
      <p>If you thought deciding on path segment slugs was bad for ordinary websites, try writing some <abbr>RDF</abbr>. Conflating the problem of deciding on an informational structure, with what to call all the pieces, is the patting your head and rubbing your belly of information architecture.</p>
    </aside>
    <p>Of course, <abbr>RDF</abbr> doesn't specify what <em>kind</em> of <abbr>URI</abbr> can go in its elements, and there are far more species in the world than just <code>http:</code>. Take, for example, an identifier like:</p>
    <pre style="font-size: 80%">urn:uuid:e8f61587-bb56-4e5c-b7dd-2954b76a84b9</pre>
    <p>The <abbr>UUID</abbr>: <a href="https://tools.ietf.org/html/rfc4122" rel="dct:references">Standard</a>, spat out of a random number generator, <span class="parenthesis" title="i.e., enough entropy">big enough</span> to enumerate every atom in the universe, and nobody is going to confuse it for the address of a Web page. Unless it <em>is</em> the address of a Web page, in which case you transform it like so:</p>
    <pre style="font-size: 80%">https://doriantaylor.com/e8f61587-bb56-4e5c-b7dd-2954b76a84b9</pre>
    <p>Once you come up with a clever title, you can derive it into a <dfn>slug</dfn>, and provided it's unique, that can be the new address. If you expose the <abbr>UUID</abbr> to the public for any reason, you can just redirect that too.</p>
    <pre style="font-size: 80%">https://doriantaylor.com/e8f61587-bb56-4e5c-b7dd-2954b76a84b9
    -&gt; https://doriantaylor.com/content-management-meta-system</pre>
    <p>So now it's about 2010 and I have a version-controlled folder on my computer that contains an ocean of files that look like <code>e8f61587-bb56-4e5c-b7dd-2954b76a84b9.xml</code>. The majority are missives I started writing and promptly forgot existed, only to start writing anew. Wouldn't it be great to have a little program that just <em>generates</em> a content inventory so I can get this under control?</p>
    <aside role="note" id="Ea8usJsF46rNzNlj--OIML">
      <p>I actually did better than that: I plumbed the version control for change dates and file naming histories, I mined the documents themselves for titles, keywords and descriptions, aggregated a list of books by <abbr>ISBN</abbr> and the pages that link to them, and of course canonicalized all the links between documents. I had successfully bifurcated the document content from its metadata.</p>
    </aside>
    <p>The program that did the mining is about a thousand lines of Python, which just zips through the designated folder and concomitant versioning database, constructs a graph, and serializes it to a file. At this stage I had been using all third-party <abbr>RDF</abbr> vocabularies to represent this metadata: <a href="http://bibliontology.com/" rel="dct:references">The Bibliographic Ontology</a> to represent the various types of documents, and <a href="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/" rel="dct:references">Dublin Core</a> for many of the relations between them. To represent people and organizations, such as authors and publishers, I used <a href="http://xmlns.com/foaf/spec/" rel="dct:references"><abbr>FOAF</abbr></a>.</p>
    <aside role="note" id="E4CrNHtTJuUSKHoMj7LIXK">
      <p><abbr>RDF</abbr> vocabularies behave very similarly to open-source modules, except they're for <em>data</em> instead of code. Furthermore, they exhibit inheritance behaviour just like object-oriented programming: By asserting a derived term on a resource, you are <em>implying</em> all the <em>other</em> semantics that term inherits, so if a consumer of your data doesn't understand your specific <em>asserted</em> term, it is likely able to glean meaningful utility from a more <em>generic</em> term the specific term implies: like how names of people and titles of documents are both specific kinds of <em>labels</em> for their respective entities.</p>
    </aside>
    <p>It didn't take long for needs to emerge that weren't expressed by these third-party vocabularies. For one, I wanted to be able to ascribe editorial destinies to these documents that were clear, machine-readable, database-selectable entities:</p>
    <ul>
      <li><em>Finish</em> because there are plenty that need that,</li>
      <li><em>Review</em>, advisable periodically,</li>
      <li><em>Revise</em>: it's mostly salvageable but needs some fixing,</li>
      <li><em>Rewrite</em>: say the same thing but completely differently,</li>
      <li><em>Split</em> into two or more documents because this one is too long,</li>
      <li><em>Merge</em> into a target document that tracks closely to this one,</li>
      <li><em>Retire</em> this piece because it's bad or no longer relevant.</li>
    </ul>
    <p>This was the impetus for <a href="https://vocab.methodandstructure.com/content-inventory#">writing my own content inventory vocabulary</a>, which I started around the cusp of 2012.</p>
    <aside role="note" id="EIuNmYXnjfkhflFsuUKFnI">
      <p><q><em>Why not just use a spreadsheet</em>?</q> Well, it turns out that this medium has a lot of nice properties you just can't achieve with a spreadsheet, even if you are a black belt at spreadsheets. The litany of benefits of representing metadata in <abbr>RDF</abbr> is another discussion entirely. I suppose the data could be <em>projected</em> into a spreadsheet, as long as the projection preserved enough information to get any subsequent changes made <em>inside</em> the spreadsheet back <em>out</em> again.</p>
    </aside>
    </section>
    <section id="EHlI37jVz-m73vrIbZ3ZFL">
      <h2>Expanding the Vocabulary</h2>
      <figure id="EumPNImG0YsnYYIxNG5FKK">
        <object style="display: block; margin: auto" type="image/svg+xml" data="file/inventory-visualized"/>
        <figcaption>
          <p>Here is an old test render of some published documents on my site organized in reverse chronological order. The documents are visualized as their bounding boxes, taken by dividing the number of characters by the&#x2014;33 em&#x2014;paragraph width, times a weighted average of each character's tendency to fill the width of the em square. The result is a good approximation of the actual geometry of each rendered document, juxtaposed against one another.</p>
        </figcaption>
      </figure>
    <p>Since I was always driving toward short documents with lots of links, I wanted an easy way to pick documents out of the inventory which most need rehabilitation. This entailed some way of measuring them. Raw word count is, in my opinion, inadequate&#x2014;I want a sense of the <em>anatomy</em> of the document as well, without having to look at it. Of course, at the time, <abbr>HTML</abbr> had no unambiguous concept of a <q>chapter</q> or <q>section</q>, so it occurred to me to count <q>blocks</q>: paragraphs, lists, tables, block quotes, <code>&lt;div&gt;</code>s, etc., and ratios like sentences and words per block. I hacked in a little statistics gatherer to my inventory generator program and amended my vocabulary to provide for descriptive statistics for each document, which could power <dfn>sparklines</dfn> that would tell me the <q>shape</q> of each at a glance.</p>
    <figure id="EV_DbOn2MqrTdNn0v99zpL">
      <img src="file/box-whisker-test;invert;scale=480,300" alt=""/>
      <figcaption>
        <p>This is an early test for a stylized box-whisker diagram of a bunch of documents ordered by descending document length. Each vertical bar represents descriptive statistics for words per block: Dark points represent medians, and the interquartile range is around the median in the darker purple, while the extrema are grey. The lighter purple bars are the &#xB1;standard deviation&#x2014;clipped at zero&#x2014;and the mean is in turquoise. <a href="file/box-whisker-test" rel="dct:hasPart dct:references">You can see the original test here</a>, which was on a black backdrop with the palette inverted.</p>
      </figcaption>
    </figure>
      <p>I also consternated over what to do about carving up the corpus. Barring  two or three collections I started before instituting the policy, I was adamant about not creating any disjoint <em>sections</em>. It was important that every resource could exist in more than one category at once. Besides, the inverse of a category is just a predicate, and a resource can have arbitrarily many of those.</p>
      <figure id="EbT4F9M-1yppRwpDSvLmbK">
        <div class="iframe">
          <iframe src="https://www.youtube.com/embed/eV84dXJUvY8" frameborder="0" allowfullscreen="" rel="dct:hasPart"/>
        </div>
          
        <figcaption>
          <p><a href="ia-summit-2017" title="Intentionally Intensional Information Architecture" rel="dct:references">I gave this talk</a> at the 2017 Information Architecture Summit about why it might be a good thing to organize information primarily by semantic relation, from which conventional categories could be derived.</p>
        </figcaption>
      </figure>
      <p>A crude type of predicate is the <dfn>tag</dfn>, but a tag is just a string of text. At the very least, there are logistical problems with coalescing minor variants of said strings that were all intended to mean the same thing. Beyond that, there is no way to ascribe a general conceptual domain for what kind of thing a tag is supposed to <em>be</em>, and no well-defined way to relate tags to one another.</p>
      <p><a href="https://www.w3.org/TR/skos-primer/" rel="dct:references">Enter <abbr>SKOS</abbr></a>: a way to represent <em>concepts</em> as distinct, identifiable entities, from each of which every conceivable label dangles like a Christmas ornament, and garlanded in semantic relations. In contrast to a puddle of text strings, a <abbr>SKOS</abbr> concept scheme is a fully-featured taxonomical structure.</p>
      <p>Next came the task of connecting the concepts to the documents. Dublin Core provides a <code>subject</code> relation, which is a good start, but only really useful for conveying what a document is <em>about</em>. A document can be about a concept and not mention it, while it can <em>mention</em> a concept and not strictly be <em>about</em> it. Thus, I added the following relations to my vocabulary:</p>
      <ul>
        <li><em>Mentions</em>: the document explicitly invokes the concept by name,</li>
        <li><em>Introduces</em>: in addition to mentioning the concept, the document defines, describes, or otherwise explains it for an audience who may not already know what it is,</li>
        <li><em>Assumes</em>: the document may or may not explicitly mention the concept, but it is written as if the audience is already familiar with it.</li>
      </ul>
      <aside role="note" id="Eagp8q9V4VTFdt8ZEBaBhJ">
        <p>Inspiration for the last of these came from an <code>educationLevel</code> relation in Dublin Core, which I thought was a cool idea but <em>way</em> too chunky for my purposes.</p>
      </aside>
      <p>As I will expound in a moment, I was keenly interested in sparing my audience of jargon, notwithstanding the content that actually treated the jargon-y topics first hand. I figured these relations would form the raw material for performing operations to that and, or at the very least for hinting at what documents treated the right content but for the wrong audience, and needed to be brought to heel.</p>
      <aside role="note" id="EYDaK2ICAAMGxdOqcN6u3J">
        <p>Attaching the concepts to the documents is an irreducibly laborious task that merits some kind of at <em>least</em> semi-automation. Even my site's modest <var>~700</var> pages are almost too much to do by hand, as evidenced by the fact that I haven't made the time to do it yet.</p>
      </aside>
    </section>
    <section id="E4DKobh-2daCtYfbpt7sBL">
      <h2>Constructing an Audience</h2>
      <p>In my professional life I consider myself to be something of a liminal character, straddling the boundary between those with technical proclivities and those who actively eschew them. I had been vacillating for some time about partitioning my content into two sections, the <a href="./" rel="dct:references" title="Make Things. Make Sense."><dfn>recto</dfn></a> for the bulk of humanity, and the <a href="verso/" rel="dct:references" title="Verso"><dfn>verso</dfn></a> for the techies&#x2014;a move I ultimately made in <time datetime="2017-11-14">late 2017</time>. But I don't want the split to be too <em>stark</em>: I am one person writing for two audiences, and the works naturally <em>mingle</em>&#x2014;they interact with one another. If I wanted them to be truly separate, I'd put them on different websites. What I want is subtler control&#x2014;a way to signal to people what side of the fence they're currently on, and when they're about to cross over.</p>
      <p>This cleavage plane manifested initially in the feeds, which, in true lazy fashion, I reingest for indexes on the home and Verso pages. Up until very recently&#x2014;<em>like <time datetime="2019-04-10">last week</time></em>&#x2014;I wrote them by hand. In order to make them amenable to being generated, I had to design some way for an algorithm to reliably pick which article went where.</p>
      <p>The current partition is simple enough: any article that talks about computers&#x2014;and moreover how to <em>do</em> things with computers&#x2014;goes into one bucket, and in the other bucket goes everything else. One could imagine though, eventually, a kind of onion-skin gradient of obscurity: the concepts at the centre are things everybody understands, and from the centre radiate little archipelagoes of specialist knowledge. Thankfully, a structure like this is precisely what a system like <abbr>SKOS</abbr> is designed to represent.</p>
      <p>If you think about it, an audience is a conceptual entity in its own right, denoting a group of people who share the same values and understand the same concepts. I added an <code>Audience</code> class to my content inventory vocabulary, which inherits properties from both <abbr>SKOS</abbr> <code>Concept</code> and the Dublin Core <code>AgentClass</code>, making it compatible with the <code>audience</code> relation of the same. To solve my partitioning problem, I created a <code>non-audience</code> relation to complement the <code>audience</code> relation from Dublin Core, and this gave me the expressivity I needed to compute the partition. Roughly:</p>
      <pre style="font-size: 80%">If the index's non-audience matches the document's audience,
  and the document has no other audiences, discard it from this index.</pre>
  <p>This small addition kept me from having to explicitly tag every document with an audience&#x2014;a set of concepts I haven't finished yet&#x2014;and the main index with every conceivable audience. Though I wouldn't have to do that, exactly: since my <code>Audience</code> class inherits from <abbr>SKOS</abbr>, it gets the full set of hierarchical semantic relations, which I use to derive, for example, whether or not a <dfn>Python programmer</dfn> is a <dfn>Programmer</dfn>, and therefore that content directed at Python programmers belongs in the techie feed.</p>
  <figure id="EK37p31CUNcAMinJqc0zmL">
    <img src="file/2019-04-17-concept-scheme" alt="" rel="dct:hasPart"/>
    <figcaption>
      <p>Here is the hairball of concepts and audiences I mined from my site a few years ago. Purple objects are plain concepts, blue are audience classes, and lime green are audiences which are also organizational roles. Orange lines indicate a <em>has broader</em> relation, where the arrowhead denotes the broader term. Green lines denote a symmetric relation. The faint lines merely connect the objects to the large green entity in the middle that stands for the concept scheme. Note that while <abbr>SKOS</abbr> can express subordination and superordination, its structure is not strictly hierarchical. Working with it is actually more set-theoretic.</p>
    </figcaption>
  </figure>
  <p>What's <em>really</em> exciting, though, is the notion of using the corpus and its attendant concepts to <em>construct</em> the set of audiences. I currently have about fifty concepts and a dozen audiences, which I just dashed off informed by nothing but a little introspection. If it wasn't a mere personal website created with play labour, I would probably base this structure on some ethnographic research in an attempt to close the gap between who I've already written for, who I <em>want</em> to be writing for, and who <em>actually</em> reads my work.</p>
</section>
<section id="EGBv2kgnfx16npMDPwqrrJ">
  <h2>Coda</h2>
  <p>This odyssey took over a decade to get to this state&#x2014;in part because it isn't anything close to my main gig, but also in true Gibsonian fashion, the future isn't evenly distributed.</p>
  <p>I wrote the original tooling in Python because the version control system I use for my website is <em>also</em> written in Python: it was therefore easy to hook directly into it and pump out the metadata. It turns out, however, that the software needed to <em>consume</em> all this wonderful data, and make effective use of it, is considerably more sophisticated than that which is needed to <em>produce</em> it. The key piece that performs all the highly convenient and time-saving inference generation is missing from the Python toolkit, and making one from scratch is about three notches above my paygrade.</p>
  <aside role="note" id="ENYhGmRbOg31RkAJyHQQOI">
    <p>It is perhaps worth noting that while I suppose it <em>could</em>, none of what I have discussed so far involves anything resembling artificial intelligence. Indeed, the programming tasks attendant to this space are actually pretty mundane. It's just that the component I'm talking about here, called a <dfn>reasoner</dfn>, is brain-bendingly <em>abstract</em>, and a decent one demands not only a lot of <em>time</em>, but world-class computational linguistics chops to implement. It's not surprising there aren't that many of them to choose from.</p>
  </aside>
  <p>As such, the path of least resistance was nothing short of a complete retooling. While the proximate code could be rewritten in just a few days, there are invariably a bunch of gaps and missing dependencies when moving from one platform to another. Not something I could afford without even a de facto sponsor. To be sure, the amount of time this project has spent idle versus in motion is easily a hundred to one.</p>
  <p>It was only last year, in <time>2018</time>, that I got the first opportunity in five years to review it. A client, as a byproduct of my project with them, had me looking at Ruby, which happens to have the missing piece! A not-very-good one albeit, but at least it <em>works</em>. Moreover, I managed to achieve a good chunk of the retooling effort just through ordinary project offgassing. As such, I have a prototype coalescing for <a href="https://github.com/doriantaylor/rb-rdf-sak" rel="dct:references">a Swiss-Army knife of sorts</a>, to do all the basic operations of the original Python code, and then some.</p>
  <section id="EZE4HoAqtYkREVa3s9wj6K">
    <h3>Coda for the Coda</h3>
    <p>I piloted this technique in a couple of other places aside from my own site, including an attempt to overhaul the website of the Information Architecture Institute. Even with professionals involved, it was a hard sell, and despite the pitfalls and caveats I've mentioned, I'm not entirely sure why.</p>
    <p>I want to reiterate that I wrote this content inventory vocabulary with the idea that it would be an <dfn>exchange format</dfn>: Some tool would generate this data, and potentially some other tool would consume it. Heck, it could even get woven straight into a content management system. It could facilitate the migration from one content management system to another. The data could be repurposed for entirely other ends. Endless tools and infrastructure could be built on top of it.</p>
    <p>The tools that I wrote for my own site, especially the most recent one, <em>may</em> be able to be used for other websites, but they're frankly kinda disposable. The important thing, to me at least, is the general technique embodied in this and other metadata vocabularies.</p>
    <p>In a way I'm surprised, because the technique is extraodinarily powerful, nothing even really comes close to touching it. I'm also <em>not</em> surprised, because it's also really hard. Twenty years in, the Semantic Web is still missing key elements to make it, if not <em>easy</em> to use, at least worth the pain. But it's <em>demand</em> that drives the building of better tools, and the chamfering of their sharp corners.</p>
    <p>I hope what I have shared today ignites some interest.</p>
  </section>
</section>
  </body>
</html>
