<?xml version="1.0"?>
<?xml-stylesheet href="/transform" type="text/xsl"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:bs="http://purl.org/ontology/bibo/status/" xmlns:ci="https://vocab.methodandstructure.com/content-inventory#" xmlns:dct="http://purl.org/dc/terms/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xhv="http://www.w3.org/1999/xhtml/vocab#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" lang="en" prefix="bibo: http://purl.org/ontology/bibo/ bs: http://purl.org/ontology/bibo/status/ ci: https://vocab.methodandstructure.com/content-inventory# dct: http://purl.org/dc/terms/ foaf: http://xmlns.com/foaf/0.1/ rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# xhv: http://www.w3.org/1999/xhtml/vocab# xsd: http://www.w3.org/2001/XMLSchema#" vocab="http://www.w3.org/1999/xhtml/vocab#" xml:lang="en">
  <head>
    <title property="dct:title">Content Robo-Inventory</title>
    <base href="https://doriantaylor.com/content-robo-inventory"/>
    <link href="document-stats#EqbeiyU46vEHgWpwZl80cL" rev="ci:document"/>
    <link href="elsewhere" rel="alternate bookmark" title="Elsewhere"/>
    <link href="this-site" rel="alternate index" title="This Site"/>
    <link href="http://purl.org/ontology/bibo/status/published" rel="bibo:status"/>
    <link href="" rel="ci:canonical" title="Content Robo-Inventory"/>
    <link href="lexicon/#EqIUfKvI93wG3TQRQDwoVJ" rel="dct:audience" title="Information Architect"/>
    <link href="lexicon/#ErEcH3z-9vn29IFbZ0GdpI" rel="dct:audience" title="Content Strategist"/>
    <link href="person/dorian-taylor#me" rel="dct:creator" title="Dorian Taylor"/>
    <link href="2011-07-01-iai-site-map-full" rel="dct:hasPart"/>
    <link href="person/dorian-taylor" rel="meta" title="Who I Am"/>
    <link about="./" href="3f36c30c-6096-454a-8a22-c062100ae41f" rel="alternate" type="application/atom+xml"/>
    <link about="./" href="f07f5044-01bc-472d-9079-9b07771b731c" rel="alternate" type="application/atom+xml"/>
    <link about="./" href="this-site" rel="alternate"/>
    <link about="./" href="elsewhere" rel="alternate"/>
    <link about="./" href="e341ca62-0387-4cea-b69a-cdabc7656871" rel="alternate" type="application/atom+xml"/>
    <link about="verso/" href="3f36c30c-6096-454a-8a22-c062100ae41f" rel="alternate" type="application/atom+xml"/>
    <link about="verso/" href="this-site" rel="alternate"/>
    <link about="verso/" href="elsewhere" rel="alternate"/>
    <meta content="content-robo-inventory" datatype="xsd:token" property="ci:canonical-slug"/>
    <meta content="Here is an update concerning my work doing automated semantic content inventories, using the IA Institute website as a guinea pig." name="description" property="dct:abstract"/>
    <meta content="2011-06-29T20:49:32+00:00" datatype="xsd:dateTime" property="dct:created"/>
    <meta content="content-robo-inventory" property="dct:identifier"/>
    <meta content="2011-06-29T20:49:04+00:00" datatype="xsd:dateTime" property="dct:issued"/>
    <meta content="2011-06-29T20:49:55+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2011-07-03T20:24:55+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2022-05-31T04:18:52+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2022-05-31T15:10:50+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta about="person/dorian-taylor#me" content="Dorian Taylor" name="author" property="foaf:name"/>
    <meta content="summary" name="twitter:card"/>
    <meta content="@doriantaylor" name="twitter:site"/>
    <meta content="Content Robo-Inventory" name="twitter:title"/>
    <meta content="Here is an update concerning my work doing automated semantic content inventories, using the IA Institute website as a guinea pig." name="twitter:description"/>
    <object>
      <nav>
        <ul>
          <li>
            <a href="2-up-content-audit" rev="dct:references" typeof="bibo:Article">
              <span property="dct:title">2-Up Content Audit</span>
            </a>
          </li>
          <li>
            <a href="my-work-at-the-ia-institute-an-anthology" rev="dct:references" typeof="bibo:Article">
              <span property="dct:title">My Work at the IA Institute: An Anthology</span>
            </a>
          </li>
          <li>
            <a href="navigation-by-shibboleth" rev="dct:references" typeof="bibo:Article">
              <span property="dct:title">Navigation by Shibboleth</span>
            </a>
          </li>
          <li>
            <a href="serendipitous-questionnaire" rev="dct:references" typeof="bibo:Article">
              <span property="dct:title">Serendipitous Questionnaire</span>
            </a>
          </li>
          <li>
            <a href="the-symbol-management-problem" rev="dct:references" typeof="bibo:Article">
              <span property="dct:title">The Symbol Management Problem</span>
            </a>
          </li>
          <li>
            <a href="document-stats#EqbeiyU46vEHgWpwZl80cL" rev="ci:document" typeof="qb:Observation">
              <span>urn:uuid:a9b7a2c9-4e3a-4bc4-b1e0-5a9c1997cd1c</span>
            </a>
          </li>
        </ul>
      </nav>
    </object>
  </head>
  <body about="" id="EWlhglLC4u33suPu5FWCdK" typeof="bibo:Article">
    <section id="EyylZ25zYwC6K_fUX87AHK">
      <figure id="E57bud1r8Y_IxiB4-o3nrK"><a href="file/2011-07-01-iai-site-map-full.pdf" title="Information Architecture Institute - Site Map Rendering" rel="dct:references"><img class="figure" width="540" height="400" style="display: block; margin: auto" src="2011-07-01-iai-site-map-full;desaturate;scale=540,400" alt=""/></a></figure>
      <p>I have been teasing certain content strategists since early 2010 about a method for performing automated content inventories plus instrumentation that improves the efficiency and sophistication of subsequent audits. Instrumented <em>how</em>, you might be inclined to ask? I'll tell you, but only after harping for a minute.</p>
    </section>
    <section id="EbuNcREkCPnIzfWMoYxROI">
      <h2>Define the Damn Thing&#x2122;</h2>
      <p>It's this kind of tool that makes any attempt at a clear-cut distinction between information architecture, content strategy, business intelligence, <acronym title="Search Engine Optimization">SEO</acronym> and probably four or five other nascent disciplines start to look a bit silly. Among other reasons, it's because it <a href="http://www.w3.org/RDF/" title="RDF - Semantic Web standards" rel="dct:references">makes use of a data structure</a> which, after working with it for the past five years or so, I'm pretty confident is the natural home for every single one of these concerns. Moreover, the data has to be sourced from several different organizational silos in order to be brought together in one place and be made useful. Who should own that problem? Who even has the authority?</p>
    </section>
    <section id="EKnrLCBF6U2l3PQ-i7OCxJ">
      <h2>Well, So Far I Do</h2>
      <p>I took an interest in this idea because I wanted to get a handle on <a href="./" title="Make things. Make sense." rel="dct:references">my own site</a>, which is a scintillating paragon of neglected content &#x2014; so much so that only about af fifth of it is even <em>visible</em>, of which maybe half is current. The rest is squirreled away in various stages of incompleteness. I wanted a meaningful way of putting the whole corpus front and centre so that I could quickly expose and prioritize the material that needs attention. I had a nagging feeling, however, that there was something insufficiently <em>real-world</em> about my own site, which, while admittedly a feat of extreme laziness, it is at least informed by working with the web virtually every day for half a lifetime. As such, the content inventory engine I wrote just didn't feel <em>right</em> until I could turn it toward a corpus that was a bit more representative of what I was likely to find in the wild.</p>
      <p><a href="http://iainstitute.org/en/about/people/board_of_directors_biographies.php" title="Board of Directors Biographies - Information Architecture Institute" rel="dct:references">Good thing you folks elected me</a> to the <acronym title="Information Architecture">IA</acronym> Institute board! <a href="http://iainstitute.org/" title="The Information Architecture Institute" rel="dct:references">Its site is perfect</a>. Just what I need to finish my masterpiece, which <a href="its-only-fitting-that-the-cobblers-children-get-shoes" title="It's Only Fitting that the Cobbler's Children Get Shoes" rel="dct:references">I started finishing</a> in November 2010, and continued at a pace consistent with a volunteer directorship of a non-profit organization*. Well, I'm happy to say that I'm at it again. Let's consider some of the data points I'm interested in.</p>
      <aside role="note" id="Eobi6YKKWqR_2o4QYIV-ZL">
        <p>* If there is any corporate interest in accelerating this work, you know where to find me.</p>
      </aside>
    </section>
    <section id="EZBl8eZpnGM0fqeoxXOvZL">
      <h2>Stuff I'd Like to See in <em>My</em> Content Inventory</h2>
      <p>You know, because I inventory like a boss.</p>
      <section id="E4yPwroFPilywcz-18-TVL">
        <h3>Data About Individual Resources</h3>
        <ul>
          <li>The canonical <acronym title="Uniform Resource Identifier">URI</acronym> of the resource</li>
          <li>Any other <acronym title="Uniform Resource Identifier">URI</acronym>s the resource has or might have had in the past</li>
          <li>Who wrote the damn thing or has otherwise touched it</li>
          <li>When it was written</li>
          <li>How many times it's been revised and when each revision happened</li>
          <li>How much traffic it gets relative to the rest of the corpus</li>
          <li>How long it is in words, paragraphs and sections</li>
          <li>How long that is relative to the rest of the corpus</li>
        </ul>
      </section>
      <section id="Es_8jdkrG5cxy6m0mu6JfL">
        <h3>Actual Metadata of Individual Resources</h3>
        <ul>
          <li>Title</li>
          <li>Short title, for use in links with constrained real estate</li>
          <li>Abstract or description or whatever you feel like calling it</li>
          <li>Intended audience as a <a href="http://xmlns.com/foaf/spec/#term_Agent" title="FOAF Vocabulary Specification" rel="dct:references"><acronym title="Friend of a Friend">FOAF</acronym> agent</a> or <a href="http://dublincore.org/documents/dcmi-terms/#classes-AgentClass" title="DCMI Metadata Terms" rel="dct:references">Dublin Core AgentClass</a></li>
          <li>Subject as some kind of resource, at least a <a href="http://www.w3.org/TR/skos-reference/skos.html#Concept" title="SKOS Simple Knowledge Organization System Namespace Document - HTML Variant, 18 August 2009 Recommendation Edition" rel="dct:references"><acronym title="Simple Knowledge Organization System">SKOS</acronym> concept</a>; not to be confused with&#x2026;</li>
          <li>Relevant concepts, also <acronym title="Simple Knowledge Organization System">SKOS</acronym> concepts, from which keywords, tags, whatever can be derived</li>
        </ul>
      </section>
      <section id="EPUV1sf9pd5QLp4a3m6U0J">
        <h3>Information About Links</h3>
        <ul>
          <li>Links from this resource to other places within the site</li>
          <li>Links from this resource to other sites</li>
          <li>Links which are hidden, e.g. with the <code>&lt;link&gt;</code> element</li>
          <li>Links which are part of the navigation or other chrome</li>
          <li>Links which are part of a widget or other ancillary content</li>
          <li>Links which are part of the <em>actual</em> content</li>
          <li>Links which reference a form</li>
          <li>Media assets which are embedded, including but not limited to images</li>
          <li>Scripts, stylesheets and other utilities referenced in the resource</li>
          <li>Inbound links from <em>within</em> the site</li>
          <li>Inbound links from <em>other</em> sites</li>
          <li>A <em>huge angry beacon</em> if the resource is an orphan, i.e. it has no inbound links from within the site</li>
          <li>Any well-trodden paths through the site this resource may belong to</li>
        </ul>
      </section>
      <section id="EUnxCtg-4RK1AinH2UnDzI">
        <h3>Now, What To Do About It?</h3>
        <ul>
          <li>Status, augmenting existing <a href="http://bibotools.googlecode.com/svn/bibo-ontology/trunk/doc/classes/DocumentStatus___-772098841.html" title="Class: bibo:DocumentStatus" rel="dct:references">bibo:DocumentStatus</a> with: <var>empty</var>, <var>incomplete</var>, <var>incorrect</var>, <var>obsolete</var>, <var>retired</var> and <var>orphan</var>, which I already mentioned</li>
          <li>Action, as in what to do about it and who to pin it on: <var>keep</var>, <var>split</var>, <var>merge</var>, <var>update metadata</var>, <var>proofread</var>, <var>revise</var>, <var>rewrite</var> and <var>retire</var></li>
          <li>Any helpful ad-hoc annotations or bookmarks through a mechanism like <a href="http://www.w3.org/2001/Annotea/" title="Annotea project" rel="dct:references">Annotea</a></li>
        </ul>
      </section>
    </section>
    <section id="Ec1QD53NzWXuRQnqwF_sxI">
      <h2>Wait, You Forgot Sections!</h2>
      <p><a href="file/nope" title="NOPE" rel="dct:references">Nope</a>. I'm pretty convinced that the idea of prescribed sections on the web is a diversion from what the web excels at. The structure of the <acronym title="Resource Description Framework">RDF</acronym> data model is such that it represents connections between resources based on what they <em>mean</em>, both in the type of connection and the content of the resources themselves. These connections accrue over time and represent associations that ultimately <em>people</em> have found meaningful. When we pile all this information together, along with certain elements above, we can <a href="http://en.wikipedia.org/wiki/Force-based_algorithms_%28graph_drawing%29" title="Force-based algorithms (graph drawing) &#x2014; Wikipedia" rel="dct:references">tease the model apart</a> along its natural fissures. Try as we might, our best tools for hand-crafting sets of resources equate to biased heuristics. We would do ourselves a service to take advantage of the data. Besides, any existing section landings will still be present in the scan, it's just a question of how useful they will be.</p>
    </section>
    <section id="EsXp4RO42gh2iPxUGuGOeK">
      <h2>Anyhoo&#x2026;</h2>
      <p>My super-ghetto prototype website-to-<acronym title="Resource Description Framework">RDF</acronym> crawler <del>is chugging away on <a href="http://iainstitute.org/" title="The Information Architecture Institute" rel="dct:references">iainstitute.org</a> as I write. It is slow, because I am too lazy to make it multithreaded. When it exits, I'm gonna make another <a href="its-only-fitting-that-the-cobblers-children-get-shoes" title="It's Only Fitting that the Cobbler's Children Get Shoes" rel="dct:references">pretty picture like this</a>, and then I'm going</del> <ins>is now finished and the results are at the top of the page. Now</ins> to get to Actual Work&#x2122;.</p>
      <p>Peace out.</p>
    </section>
  </body>
</html>
