I have been teasing certain content strategists since early 2010 about a method for performing automated content inventories plus instrumentation that improves the efficiency and sophistication of subsequent audits. Instrumented how, you might be inclined to ask? I'll tell you, but only after harping for a minute.

Define the Damn Thing™

It's this kind of tool that makes any attempt at a clear-cut distinction between information architecture, content strategy, business intelligence, SEO and probably four or five other nascent disciplines start to look a bit silly. Among other reasons, it's because it makes use of a data structure which, after working with it for the past five years or so, I'm pretty confident is the natural home for every single one of these concerns. Moreover, the data has to be sourced from several different organizational silos in order to be brought together in one place and be made useful. Who should own that problem? Who even has the authority?

Well, So Far I Do

I took an interest in this idea because I wanted to get a handle on my own site, which is a scintillating paragon of neglected content — so much so that only about af fifth of it is even visible, of which maybe half is current. The rest is squirreled away in various stages of incompleteness. I wanted a meaningful way of putting the whole corpus front and centre so that I could quickly expose and prioritize the material that needs attention. I had a nagging feeling, however, that there was something insufficiently real-world about my own site, which, while admittedly a feat of extreme laziness, it is at least informed by working with the web virtually every day for half a lifetime. As such, the content inventory engine I wrote just didn't feel right until I could turn it toward a corpus that was a bit more representative of what I was likely to find in the wild.

Good thing you folks elected me to the IA Institute board! Its site is perfect. Just what I need to finish my masterpiece, which I started finishing in November 2010, and continued at a pace consistent with a volunteer directorship of a non-profit organization*. Well, I'm happy to say that I'm at it again. Let's consider some of the data points I'm interested in.

* If there is any corporate interest in accelerating this work, you know where to find me.

Stuff I'd Like to See in My Content Inventory

Data About Individual Resources

Actual Metadata of Individual Resources

Information About Links

Now, What To Do About It?

Wait, You Forgot Sections!

Nope. I'm pretty convinced that the idea of prescribed sections on the web is a diversion from what the web excels at. The structure of the RDF data model is such that it represents connections between resources based on what they mean, both in the type of connection and the content of the resources themselves. These connections accrue over time and represent associations that ultimately people have found meaningful. When we pile all this information together, along with certain elements above, we can tease the model apart along its natural fissures. Try as we might, our best tools for hand-crafting sets of resources equate to biased heuristics. We would do ourselves a service to take advantage of the data. Besides, any existing section landings will still be present in the scan, it's just a question of how useful they will be.


My super-ghetto prototype website-to-RDF crawler is chugging away on iainstitute.org as I write. It is slow, because I am too lazy to make it multithreaded. When it exits, I'm gonna make another pretty picture like this, and then I'm going is now finished and the results are at the top of the page. Now to get to Actual Work™.

