I'm almost embarrassed at how large a repertoire I've accumulated over the years of techniques for converting data between one format and another. You could say that my entire professional experience effectively reduces to getting information from one place and one shape, to a different place and/or into a different shape. This experience is beginning to coalesce for me into a solid set of principles for solving the problems of distributed information systems.

The Ghetto CMS

Working with a client sometime in 2010, among other things I was helping figure out the problem of keeping their modest, but important website up to date. The heavily-abridged and totally doctored conversation went something like this:

Okay, so you sit in front of Outlook all day, which among other things is where you already schedule your organization's events. Your constituents all live somewhere in between your accounting software and Outlook, both of which speak vCard. You have collections of photos which you have annotated by way of Excel spreadsheets, and you receive your major documentation project from its editors as a set of Word documents. To top it all off, you already more than know your way around programs like DreamWeaver, as would anybody who would take over your duties in the future.

The next thing I knew, I had written what can be ultimately considered an ultra-lightweight content management system. There's no SQL database, no Web-based management interface. All data is sourced from files in common formats; all edits happen through the server's file system.

Specifically, I wrote some code to turn iCal and vCard files into HTML on the fly. At least that's how it started, but then I kept going. Web pages added to a specified news folder automatically become part of an Atom feed. Folders full of photos get dynamically cropped, thumbnailed and placed into paginated galleries. Virtually all the micro-content on the site—structure, sequences, labels, et cetera—can be controlled with Excel. All of this material is wrapped in a post-processor that is responsible for two things: dereferencing transcluded resources from other parts of the site, and of course the template for the visual design, which, naturally, is also editable. That was so that when my client does have to write some meager glue HTML—in his native web-editing app—most of the work is already taken care of for him.

I classify this excursion as a successful user non-experience. That is, the experience is that there is nothing to experience. The man has a job to do and he wants to be able to get it done as quickly and painlessly as possible: He just dumps the files in and moves on to something else. He isn't the first to be cool with—if not outright request, as others have—bulk operations solved in this manner either. I will consider this more closely after the following interlude.

wheel.reInvent();

Of course, I am by no means implying that this idea is new. It seems plausible that the big-time CMS vendors would already do this in some capacity. I wouldn't know, though. You see, I did enterprise content management for just under four years, from early 2002 through 2005, but I've never once deployed a third-party CMS product. I don't even bother to keep track of them. My MO is to use commodity parts to fashion a snug-fitting system. The test of a commodity part is that you can swap it for somebody else's product without incurring significant injury. From what I've seen, that's a pretty hard condition to satisfy when you bring a content management system into play.

Besides, the dirty secret about programming for the Web is that it is dead easy to make just about anything happen, provided you know precisely what you want, which, in relative terms, doesn't occur all that often. Therefore if you do know what behaviour you want, there's scarcely an excuse not to implement it. At the very least, establishing a satisficing solution for the behaviour that you do want, on top of, you know, getting the job done, will enable you to judge if somebody else is doing a better job of providing it. But most importantly, it will enable you to spot behaviour that you don't want—that which would disqualify a third-party product—before the contract is signed.

</interlude>

I want to consider for a moment the archetypal environment of the people charged with the dismalexciting task of updating websites. What statement can we make about them that will almost certainly be true?

They're doing the work on a PC.

Historically, the PC has two significant sources of input: that which you can sausage-finger into the keyboard, and that which you can load in through a floppy disk, otherwise known as the file. The former is useful for deft, delicate operations and the latter is good for bulk. The PC is the bastard offspring of some 1970s Haight-Ashbury love-in and was born without a network adapter. True story! It's the only way dialogue boxes could have conceivably been considered a good idea: it was literally a one-to-one conversation between you and the computer because nobody else was around. Don't think for a second that isolated upbringing doesn't have a long-term effect on behaviour.

What this moist little nugget of understanding means is that these people have been kitted out forever with the means to operate over mountains of content with considerable efficiency. The strategy is to bunch all the diverse, nimble little operations up in their local space, then shunt their results downstream in a single package. People worked this way just fine, long before the inception of the AJAX rich-text editor. Which makes me wonder: if you're gonna make a Web interface for manipulating data of some kind, why not make it operate over some structure for which there isn't already an extraordinarily mature desktop tool that does exactly the same job?

If you're going to make a Web-based interface for a task like entering text, consider it remedial. I don't care how snazzy HTML5 is. Whoever uses that functionality is almost certainly trying to do their job on their phone from down at the bar or something. And, if you insist on making one, for goodness sake fix the back button. I once saw a guy put his fist through a wall after clipping that and losing the magnum opus he had spent the previous hour naïvely hunting-and-pecking into his browser.

Christine from Accounting

An information system tends to work best when a given datum has precisely one (logical) authoritative home—when it has one owner and everybody else rents. And a lot of the time we can't really control what locale gets chosen. What this means is that the master copy of your company's payroll, for instance, to which every other record is subordinate, is not in your bazillion-dollar ERP system but rather in an Excel spreadsheet on the cluttered, LOLcat-emblazoned desktop of Christine from Accounting.

The question is not how to coax Christine into taking on the Sisyphean burden of reconciling her spreadsheet with the ERP monolith, but how to get her to share that information with the appropriate people without putting in one iota of extra work.

Settle Down, Beavis

I'm not suggesting that we should revert back to storing our sensitive information in files on peoples' desktops, but rather keep the familiar semantics of the file system as an interface. At least until we can mind-control our robot butlers to do our work for us while we cruise around in our flying cars or whatever.

As I write this article, I am working with a client to create an intranet that visualizes HR data, which it gets periodically in an email from an upstream source. Along for the ride is some—but not all—relevant contact information of the people the data concerns. This information is most useful when it lives in the organization's Active Directory, which, being an LDAP server, also makes for a handy-dandy authorization database for just about everything, on top of serving out contact information to email programs (as well as the photocopier!). Finally, this information is augmented by paper forms circulated to employees which get collated into our old buddy, Excel. And that, sure as shit, isn't going anywhere.

So, the quantitative information lives in an SQL database, because once upon a time those did things like business intelligence, rather than act as storage mechanisms for blogs. The master copy of all the authentication, access control and contact information is stored in LDAP, which in turn grants rights to view the quantitative information on two levels: directly in the database itself, and through the Web app. Finally, there is a two-way conduit between the contact database and the Excel file which concentrates the paper records. So simple.

Give That Grotesque Body a Hug

This is a lesson ground into my bones from the multitude of lugubrious failures over half a lifetime of working with information systems: no matter how pristine, orderly and all-encompassing your design, there will always be that burr, that nagging exception that compromises its integrity. And then a few months down the road there will be another one.

The lesson is embrace heterogeneity. It turns out that it isn't all that important to get every little burp and fart of business process under a single umbrella. What is important is that you can move information between locations and representations when you need to. Or, to paraphrase Wurman, the parts between the parts are more important than the parts.

Systems Run Better Downhill

…or so saith the de facto bible on the subject. A lot of these principles seem to have been figured out a long time ago. There's a paper written by Jonathan Grudin from back in 1988, when I was still playing Ninja Turtles, which suggested that many systems designed to support collaborative work were in fact zero-sum: they abridged the work of some—usually the purchasing managers—by generating extra work for others. Decades later, it appears not much has changed.

Information doesn't really flow, either. It sort of hops—arcs—from natural source to natural sink. Moving information in the opposite direction is an aberration. It's unreliable and mounds more expensive than reversing the natural polarity of that part of the system. And inverting that polarity is almost always a play on the psychology of human beings, to give them a reason to play along. The rest of the time, we can, and should, rely on the informational equivalent of gravity.

Resilience is the New Efficiency

I can't remember where I read the following, but it goes like this: An appliance is something you replace. A tool is something you sharpen. I'm not especially interested in making appliances, otherwise known as applications. They're too monolithic, too single-minded. They aren't made with their surroundings in mind; they don't recognize the arbitrarily many ways they could be applied in concert with other elements in their environment. This is because an application's application is preordained.

The infrastructural analogue to the appliance/application is the solution, which reduces to an assembly line: an all-encompassing, end-to-end process which generates a narrow range of products, for which the priority is cost-efficiency. An assembly line may be efficient, but a well-stocked workbench is a lot more robust.

But efficiency assumes that you already know what you're doing and you just need to do it bigger, faster and cheaper. I venture that in today's business climate, such clarity is rarely so self-evident. There is enormous value in cohering an idea, and then laying out the equipment to play around with it, enabling businesses to slow down without slowing down their business. That's really the kind of work I love to do: making tools, language and environments for organizations to tinker, introspect and explore.