Navigation by Shibboleth

I began this site in May 2008 to explain, to myself, a complex and esoteric topic. I wanted to use the properties inherent to hypertext, and by extension the Web, to help tell the story. I was really interested in Nelson's original motivation behind hypertext, which included providing people with multiple paths through a body of work, helping them explore and comprehend the content in the sequence that made the most sense to them. If you found yourself on a page where you didn't understand what I was talking about, you could click on the offending part and get some background information.

Part of what attracts me to hypertext is that it affords—nay, begs for—parenthesis and digression. Whereas an essay or monograph is all about getting to the point, we can use hypertext to perorate on some interesting side detail, then stow it away behind a link. As a writing experience it's totally organic, and I don't tax any reader who isn't interested. Only problem is it completely obliterates the notion of done. So I eventually shelved my Zeno-esque treatise until I could develop some tools that would get it under control. Moreover, writing decent hypertext with print-era tools is straight-up hard. So in the meantime, while I figure out what to do about some sort of authoring solution, I've just reverted to writing essays, like the one you see here.

Oh, and people started reading them too, which was never really part of the plan.

Where the Title Comes In

A shibboleth is a way of telling if somebody either knows or doesn't know something, even if they don't know they know (or don't know) it. Spies used to use the concept to out each other, for instance by tricking their opponents into saying certain words to expose a hidden accent. Hang on to that idea for a moment.

As I mentioned, I routinely deal in complex and esoteric topics. Of course, these topics often tie in with accessible, everyday issues, because I discuss them in order to use them. Moreover, to some people, they are everyday issues. And now that I have a sizeable corpus, I'm seeing a few faint patterns in what I write. Roughly:

The first two of those would likely attract the same people—possibly the third and fourth as well. But the technical articles could be split into at least three distinct disciplines that typically only talk to each other when they absolutely have to. I've literally been mulling for years on a way to carve up this site so that it piques peoples' interests in the right places, and enables them to safely ignore the rest, unless I introduce it to them serendipitously sometime in the future.

How the hell do I organize this site?

The inverse-chronological colly on the front page is exactly what I didn't want to end up with. I have tried my damnedest to keep everything on this site as temporally neutral as I can make it. I even intentionally leave the dates off the documents. Temporality only matters if you've already read everything and you want to see what's new or changed, like if you've subscribed to a feed. Which is exactly what that is on the front page. I just transclude it with XSLT which I run in your browser. Remember: lazy.

I remember one of the first times I visited Wikipedia. My first impression was where the hell are the sections? I felt completely unable to get a handle on even the magnitude of the content. Consider an encyclopedia in codex format: you can actually see that the listings under Q and X are thinner than the ones under R and E. You can see the extent of the thing on the shelf. Wikipedia divulges none of this. It's like a Gabriel's Horn— finite volume, but infinite surface area.

In this set of diagrams, Alexander is talking about an abstract design problem. Each dot represents a requirement. Each line represents how two requirements inform one another. The circles represent the optimal decomposition, since they only cut two lines. That striped sluglike shape represents an arbitrary heading (Alexander is an architect, so he uses examples like *acoustics* or *neighbourhood*). The problem of dividing up websites into least-overlapping sections is almost identical.

Plus, I agree completely with Christopher Alexander that the optimal hierarchical decomposition pattern of any complex system is found by isolating the subsystems with the least information moving between one another, and that this pattern is almost never congruent with any decompositions we can consciously prescribe.

That, and I totally got over the broken record of About Us/Our Crap/Our Other Crap/Our Links well over a decade ago.

Ever wonder why done is so ill-defined?

I was considering some kind of tag-cloudish construct, but as far as I'm concerned, once you get over the magic number, they grow asymptotically useless. And they still don't tell you how much stuff there is on the site to read.

A book hints at its required commitment with its thickness. A novel is also a decidedly different shape of commitment than an anthology, comic book, newspaper or coffee table book, implied by the shape of each artifact. The Web exhibits nothing like this. Even its base unit, the page, is arbitrarily long.

Even if I did come up with a way to communicate to readers how much effort they're on the hook for, I would still want to engender the sense of completion a person gets when they finish a chapter, and know that they can rest that much closer to the end. It's important for people to be able to unload all that state information they build up in their heads. A colloquial term for that might be closure. But a digraph, such as a website, has no concept of an end—or a start. It just has entry points, paths and overall coverage.

The Can-of-Worms Coefficient

So we're after two things: clusters of documents (i.e. sections) and viable entry points (i.e. cover pages). After all, what we conventionally understand as a section on the Web is just another page with links that fan out to its components.

To achieve this effect, we're back to Alexander's problem: find a line which cuts the site in two pieces while cutting across the lowest number of links. Repeat the process on the pieces, recursively, until you have nothing left but cliques (clusters where every page is connected to every other page, and thus no optimal cut) or single pages. Voilà: your hierarchy. The site's structure will be the most intuitive, natural-feeling pattern in the world, and there is no way you would have come up with it otherwise.

Oh, and there is no reason to keep the inevitable 1-2-4-8… nesting sequence. The number of pages in each subset will likely vary tremendously. Just pick a handful of appropriately-sized subgraphs from the lower rungs of the generated hierarchy.

While I was writing this, I thought about something cool: Indicate to the reader when following a link would take them across a partition, by decorating it somehow. Just like how I already decorate links to other sites, or links to PDFs and such. That way you know that when you're following a link, you're either finishing off a line of inquiry or peeling the lid off another one.

Heading Up the Sections

With a candidate selected for partitioning sections, it's now time to turn my attention toward their headings. I originally considered trying to summarize whatever came out of the generated partitions, but my experience suggests that probably won't be very easy to articulate. Besides, as I add content, the partitions will invariably move around. As such I'm planning to just go with the top N articles under each partition. But how to order them?

Temporality is out. New is often not actually that important, or even that interesting. You don't go to a site to see what's new, not since feeds were invented. That said, it might make sense to explicitly consider perishable content, so people catch it before it expires. Though I personally don't have a lot of that.

Popularity as a delimiter is also problematic, because of the Matthew effect. The stuff that's already popular gets more popular, simply because it's popular, and buries the rest.

I'm tempted to use a graph metric, like centrality—essentially how important a page is. Except there are a bunch of different kinds of centrality, each slightly varying, and that really pushes the limit of my understanding of graph theory to pick which one is most appropriate. If it doesn't make sense to me, it probably won't make sense to most readers.

What I'm leaning toward is using degree to determine which page should be the entry point of each section. Specifically the ratio of the inbound links to the outbound links of each page. A low indegree means not a lot depends on that page, making it a good candidate for introductory material. A high outdegree implies lots of points to jump off from. These metrics, of course, should be taken only from the page's partitioned subgraph. Links coming in or out of the subgraph will pollute the result (besides, there won't be a lot of them if partitioning algorithm did its job).

I am so lazy you have no idea