<?xml version="1.0"?>
<?xml-stylesheet href="/transform" type="text/xsl"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:bs="http://purl.org/ontology/bibo/status/" xmlns:ci="https://vocab.methodandstructure.com/content-inventory#" xmlns:dct="http://purl.org/dc/terms/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xhv="http://www.w3.org/1999/xhtml/vocab#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" lang="en" prefix="bibo: http://purl.org/ontology/bibo/ bs: http://purl.org/ontology/bibo/status/ ci: https://vocab.methodandstructure.com/content-inventory# dct: http://purl.org/dc/terms/ foaf: http://xmlns.com/foaf/0.1/ rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# xhv: http://www.w3.org/1999/xhtml/vocab# xsd: http://www.w3.org/2001/XMLSchema#" vocab="http://www.w3.org/1999/xhtml/vocab#" xml:lang="en">
  <head>
    <title property="dct:title">Visualizing Paths Through the Web</title>
    <base href="https://doriantaylor.com/visualizing-paths-through-the-web"/>
    <link href="document-stats#E-rrKlrgY88ZOeiwMQ7xEI" rev="ci:document"/>
    <link href="elsewhere" rel="alternate bookmark" title="Elsewhere"/>
    <link href="this-site" rel="alternate index" title="This Site"/>
    <link href="http://purl.org/ontology/bibo/status/published" rel="bibo:status"/>
    <link href="" rel="ci:canonical" title="Visualizing Paths Through the Web"/>
    <link href="lexicon/#EqIUfKvI93wG3TQRQDwoVJ" rel="dct:audience" title="Information Architect"/>
    <link href="lexicon/#ErEcH3z-9vn29IFbZ0GdpI" rel="dct:audience" title="Content Strategist"/>
    <link href="person/dorian-taylor#me" rel="dct:creator" title="Dorian Taylor"/>
    <link href="site-path" rel="dct:hasPart"/>
    <link href="what-i-do" rel="dct:references" title="What I Do"/>
    <link href="lexicon/#E1A8DBAFHvuhCgTUPIAVlJ" rel="dct:subject" title="Data Visualization"/>
    <link about="./" href="3f36c30c-6096-454a-8a22-c062100ae41f" rel="alternate" type="application/atom+xml"/>
    <link about="./" href="f07f5044-01bc-472d-9079-9b07771b731c" rel="alternate" type="application/atom+xml"/>
    <link about="./" href="this-site" rel="alternate"/>
    <link about="./" href="elsewhere" rel="alternate"/>
    <link about="./" href="e341ca62-0387-4cea-b69a-cdabc7656871" rel="alternate" type="application/atom+xml"/>
    <link about="verso/" href="3f36c30c-6096-454a-8a22-c062100ae41f" rel="alternate" type="application/atom+xml"/>
    <link about="verso/" href="this-site" rel="alternate"/>
    <link about="verso/" href="elsewhere" rel="alternate"/>
    <meta content="visualizing-paths-through-the-web" datatype="xsd:token" property="ci:canonical-slug"/>
    <meta content="Here is another installment on data-driven content strategy. This time I demonstrate a technique for looking at the paths readers take through a site, specifically this one." name="description" property="dct:abstract"/>
    <meta content="2011-07-17T18:07:30+00:00" datatype="xsd:dateTime" property="dct:created"/>
    <meta content="visualizing-paths-through-the-web" property="dct:identifier"/>
    <meta content="2011-07-17T18:06:45+00:00" datatype="xsd:dateTime" property="dct:issued"/>
    <meta content="2011-07-17T18:09:15+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2011-07-18T16:14:12+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2019-04-18T01:04:55+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2022-05-31T04:18:52+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2022-05-31T15:10:50+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta about="person/dorian-taylor#me" content="Dorian Taylor" name="author" property="foaf:name"/>
    <meta content="summary_large_image" name="twitter:card"/>
    <meta content="@doriantaylor" name="twitter:site"/>
    <meta content="Visualizing Paths Through the Web" name="twitter:title"/>
    <meta content="Here is another installment on data-driven content strategy. This time I demonstrate a technique for looking at the paths readers take through a site, specifically this one." name="twitter:description"/>
    <meta content="https://doriantaylor.com/site-path;desaturate;scale=505,390" name="twitter:image"/>
    <object>
      <nav>
        <ul>
          <li>
            <a href="2-up-content-audit" rev="dct:references" typeof="bibo:Article">
              <span property="dct:title">2-Up Content Audit</span>
            </a>
          </li>
          <li>
            <a href="ia-institute-web-committee-organizational-structure" rev="dct:references" typeof="bibo:Report">
              <span property="dct:title">IAI Web Committee Organizational Structure</span>
            </a>
          </li>
          <li>
            <a href="my-work-at-the-ia-institute-an-anthology" rev="dct:references" typeof="bibo:Article">
              <span property="dct:title">My Work at the IA Institute: An Anthology</span>
            </a>
          </li>
          <li>
            <a href="serendipitous-questionnaire" rev="dct:references" typeof="bibo:Article">
              <span property="dct:title">Serendipitous Questionnaire</span>
            </a>
          </li>
          <li>
            <a href="document-stats#E-rrKlrgY88ZOeiwMQ7xEI" rev="ci:document" typeof="qb:Observation">
              <span>urn:uuid:fabaca96-b818-4f3c-864e-7a2c0c43bc44</span>
            </a>
          </li>
        </ul>
      </nav>
    </object>
  </head>
  <body about="" id="ER9LePXQaO1YiDXK1PqtpL" typeof="bibo:Article">
    <section id="E-SgX9Lli16hwbj9vI4cHJ">
      <p>When auditing content for the <abbr title="World-Wide Web">Web</abbr>, it's important to remember that although many of us still <em>write</em> <abbr title="World-Wide Web">Web</abbr> content as isolated documents, they are very rarely <em>read</em> that way. It's entirely feasible for a reader to encounter inconsistent or confusing writing between one page and the next. In order to fully appreciate the story we're telling our audience, we should look at it in context, like this:</p>
      <figure id="EAGJEvBVPgnRhu0BRBXPvL">
        <a href="file/2011-07-16-site-path.pdf" rel="dct:references"><img style="display: block; margin: auto" src="site-path;desaturate;scale=505,390" alt="" rel="foaf:depiction"/></a>
      </figure>
      <p><a type="application/pdf" href="file/2011-07-16-site-path.pdf" rel="dct:references">This graph</a> is a rendering of the most frequently-trodden paths through my own site. Even before zooming in we can glean significant information about the content and the relationships between it.</p>
    </section>
    <section id="EtPyqS1E8h3p-dkJyYkLmL">
      <h2>How I Did It</h2>
      <p><abbr title="World-Wide web">Web</abbr> browsers still courteously supply us with the location of the referring resource, if present, along with each new request. This information shows up in the server's log. It's a straightforward task to turn the log into a list of referrer-referent connections, each weighted by the number of hits that go between them.</p>
      <p>The weights give us an indication of how much traffic flows between an ordered pair of pages, that is, how many people look at one specific page followed by another specific page. We can then import this list into some graph visualization software, <a href="http://gephi.org/" title="Gephi, an open source graph visualization and manipulation software" rel="dct:references">such as Gephi</a>, which will show us the trails people take into and through the site.</p>
      <p>In addition to the weighted lines, I used Gephi's built-in <a href="http://en.wikipedia.org/wiki/PageRank" title="PageRank &#x2014; Wikipedia" rel="dct:references">PageRank</a> analysis to show me the highest-ranked pages by tying it to the size of the dots that represent them.</p>
      <aside role="note" id="EXbxcjnBVDT6GXRdDbtSeJ">
        <p>If I have one rant, it's that there doesn't seem to be a way at the moment to get Gephi to produce <a type="application/pdf" href="http://partners.adobe.com/public/developer/en/acrobat/sdk/pdf/pdf_creation_apis_and_specs/pdfmarkReference.pdf" rel="dct:references">PDFMark</a> links in its output. That would make it much easier to examine the output, as not all <acronym title="Portable Document Format">PDF</acronym> viewers automatically recognize <acronym title="Uniform Resource Identifier">URI</acronym>s.</p>
      </aside>
    </section>
    <section id="EQfwcJOKFWH2XZh9cyywYI">
      <h2>What I Learned</h2>
      <p>To sum it up in one sentence: it looks like there is some considerable work ahead of me. Specific remarks include:</p>
      <dl>
        <dt>People appear disproportionately keen to learn about me</dt>
        <dd>They want to know <a href="person/dorian-taylor" title="Dorian Taylor" rel="dct:references xhv:meta">who I am</a>, <a href="hello-internet" title="What I Do">what I do</a> and <a href="projects/" title="Current Projects" rel="dct:references">what I'm working on</a>. Examining this particular path was my main motivator for doing this work, as those pages are obsolete, awkward and truncated, respectively. I didn't even wait for these results before replacing my what-I-do, which, while a little long, is considerably more accessible than <a href="i-manufacture-language" title="I Manufacture Language" rel="dct:references">its predecessor</a>.</dd>
        <dt><a href="policy/http-url-path-parameter-syntax" title="HTTP URL Path Parameter Syntax" rel="dct:references">That piece on <acronym title="Uniform Resource Identifier">URI</acronym> path parameters</a> is way too popular</dt>
        <dd>This is perhaps the most glaring evidence that I never intended for this site to be public. Rather, I didn't <em>mind</em> that it was publicly-accessible, but I had no interest in maintaining the tacit service guarantee associated with putting anything on the <abbr title="World-Wide web">Web</abbr>. This early work was an attempt to capture what I know about the <abbr title="World-Wide web">Web</abbr> in the true style of hypertext, though I eventually found it too time-consuming to manage and just reverted to writing essays. It appears, however, that I should at least give it and its neighbours a second chance.</dd>
        <dt>There isn't a lot of traffic between the essays themselves</dt>
        <dd>Even though those documents are studded with cross-references, a reader's next step is overwhelmingly home, or to one of <em>who I am</em>, <em>what I do</em> and <em>what I'm working on</em>. I was suspicious of this. If people insist on reading my site, I'd prefer if they got better exposure to related ideas. Solving that entails bringing metadata under management, a big chore I've been avoiding but appear to be running out of excuses not to do.</dd>
        <dt>There also seems to be a lot of love for <a href="lexicon/constraint" title="Constraint" rel="dct:references">constraints and affordances</a></dt>
        <dd>What hypertext handles really well is <em>parenthesis</em>. In its purest form, hypertext has an inimitable capacity to square away all definitions, remarks and digressions and just focus on a single, brief main message. When we write in this way, the overhead of managing all these digressions <em>explodes</em>. Ironically, <acronym title="Hypertext Markup Language">HTML</acronym> is a monumentally awkward way to manage hypertext, in part because we write the links in referring pages before we write the pages they refer to. Never mind having to choose a <acronym title="Uniform Resource Identifier">URI</acronym> for the document before writing it, how about the content itself? <acronym title="Hypertext Markup Language">HTML</acronym> is biased toward hurriedly putting <em>something</em> up at a given location, even if it isn't really very good. This is one of those casualties.</dd>
      </dl>
    </section>
    <section id="EJPg9xnxGgb-2JZMAwN0UJ">
      <h2>Most Importantly&#x2026;</h2>
      <p>If you put something on the <abbr title="World-Wide web">Web</abbr>, somebody will eventually come along and read it. Managing <abbr title="World-Wide web">Web</abbr> content is considerably different from managing it for print, even more alien than we normally appreciate. The sheer complexity generated by being able to link arbitrarily from one idea to another has implications for the way we present ourselves online that are hard to see without the proper instrumentation.</p>
      <p>I wrote the majority of the content on my site as notes to myself. Now that it appears that I have a modest following, I should probably put in the necessary elbow grease to treat you, my readers, with some courtesy. As with any situation in which we must generate buy-in&#x2014;especially if the ones to be convinced are ourselves&#x2014;it's handy to be able to point to some data.</p>
    </section>
  </body>
</html>
