Content Management Meta-System

Design Constraints

I had two constraints in mind when I got started this time:

Dense Hypermedia: I wanted to make the experience linky. I wanted the content of any one document to fit on a standard laptop screen, and I wanted it to link if it needed more space than that. The idea was to replicate old-school hypertext: whether the goal was didactic, argumentative, or narrative, I wanted the reader to be able to choose what they read next.
Cool URIs Don't Change: This was at least as much of a technical challenge as it was a stylistic one. The idea was that if you mint a Web URI—I'm talking about the actual string, not the document it points to—you lose control of it somewhat. Other people, companies, machines, services become aware of it, and they use it to return and fetch the resource—or at least some resource—identified by it. The state of URI preservation in 2008 was bad, and I wanted to see if I could do something about it. Put more generically, I wanted to have a website where no legitimate user would ever encounter a 404.

I bailed on the first constraint pretty early on. I found very quickly that writing hypertext is hard: the scope has a tendency to explode to what I estimate to be proportional to the square of the amount of writing initially expected. Publication would be constipated waiting for some parenthetical Pandora's box or other to be wrangled to satisfaction. I found this could be palliated somewhat by publishing subgraphs that only linked to each other or documents that had already been published, but the no-404 stipulation meant that I was on the hook for an increasingly unmanageable hairball until all paths through it had terminated. I wanted links in the documents to remind myself that there was something there to expand on, but I didn't want those links in the hands of users, and I certainly didn't want them in the databases of indexers either, at least not until there was something on the other end.

Without adequate instrumentation, writing dense hypertext turned out to be just too hard. Within a year I had reverted to writing essays.

The second constraint—unbreakable URIs—turned out to be easier to maintain. As a byproduct of my first tech job in 1999-2000, I had gotten familiar with mod_perl, which gives you full access to the guts of Apache without wasting your life writing C. Working at that level meant your app shipped as a unitary module, bypassing the clunkery of contemporaneous Web application development techniques like CGI scripts, or code-interpolated documents such as PHP. This taught me an important lesson: what is called the Request-URI, the combination of the /path and ?query, the part between the ://host and the #fragment, from the point of view of the standard and the Web server, may as well be a flat dictionary key. It is only by convention that it represent some location on the server's file system. If you can get into this pipeline early enough, you can make the Request-URI represent whatever you want.

Put another way: /path/hierarchies/are/not/necessary. The only thing that matters—to the server—is that the Request-URI unambiguously picks out a resource. The slug is easy enough: just do a sensible transformation of the title. Throw away the idea of sections and plunk everything in the root. If anything threatens a collision, that's when you start adding /path/segments. And when it does, just put in a redirect.

The Semantic Connection

Around this time is also when I was really ramping up my work with RDF, the lingua franca of the Semantic Web. What you find very quickly when you start working with RDF is that it is ravenously hungry for URIs. Combined with the notion that HTTP(S) URIs ought to point somewhere, this is quickly escalates into a microcontent curation nightmare.

Of course, RDF doesn't specify what kind of URI can go in its elements, and there are far more species in the world than just http:. Take, for example, an identifier like:

urn:uuid:e8f61587-bb56-4e5c-b7dd-2954b76a84b9

The UUID: Standard, spat out of a random number generator, big enough to enumerate every atom in the universe, and nobody is going to confuse it for the address of a Web page. Unless it is the address of a Web page, in which case you transform it like so:

https://doriantaylor.com/e8f61587-bb56-4e5c-b7dd-2954b76a84b9

Once you come up with a clever title, you can derive it into a slug, and provided it's unique, that can be the new address. If you expose the UUID to the public for any reason, you can just redirect that too.

https://doriantaylor.com/e8f61587-bb56-4e5c-b7dd-2954b76a84b9
    -> https://doriantaylor.com/content-management-meta-system

So now it's about 2010 and I have a version-controlled folder on my computer that contains an ocean of files that look like e8f61587-bb56-4e5c-b7dd-2954b76a84b9.xml. The majority are missives I started writing and promptly forgot existed, only to start writing anew. Wouldn't it be great to have a little program that just generates a content inventory so I can get this under control?

The program that did the mining is about a thousand lines of Python, which just zips through the designated folder and concomitant versioning database, constructs a graph, and serializes it to a file. At this stage I had been using all third-party RDF vocabularies to represent this metadata: The Bibliographic Ontology to represent the various types of documents, and Dublin Core for many of the relations between them. To represent people and organizations, such as authors and publishers, I used FOAF.

It didn't take long for needs to emerge that weren't expressed by these third-party vocabularies. For one, I wanted to be able to ascribe editorial destinies to these documents that were clear, machine-readable, database-selectable entities:

Finish because there are plenty that need that,
Review, advisable periodically,
Revise: it's mostly salvageable but needs some fixing,
Rewrite: say the same thing but completely differently,
Split into two or more documents because this one is too long,
Merge into a target document that tracks closely to this one,
Retire this piece because it's bad or no longer relevant.

This was the impetus for writing my own content inventory vocabulary, which I started around the cusp of 2012.

Design Constraints

The Semantic Connection

Expanding the Vocabulary