Key Continuity for Kindergarteners

This idea has been beaning around in my head since the summer of 2007, but has until now resisted being committed to writing. This is part of my new policy of clearing the intellectual decks to free up my brain for my other interests and responsibilities, while having something to show for it. This particular burr came out of a discussion I had with Ivan Krstić, erstwhile OLPC security chief, on a visit to his office. His problem was how to protect five-year-olds from having their private communications intercepted, which is a pretty challenging and equally important task.

The problem reduces to the following. How do you turn something that looks like this:

350dc702d7dd0962df82d32c1b1f5bb7 4aad81a2c9499318460868985b86c485

into something a normal person, let alone a child, is going to recognize? Or more specifically, how do you turn it into something that they can at least compare with a similar object to determine if it is different from the original? And understand and care why they should?

Huh?

That obtuse lump of data is a cryptographic hash. It is the almost-certainly unique output of a function that consumes whatever data you throw at it and returns a fixed-size symbol. If you feed the function the same data a second time, it returns the same hash. Its purpose is to ensure that the data you receive is what you're expecting when you don't have access to the original or don't want to keep it around. It is also handy for telling if your data is being tampered with.

In secure communications, cryptographic hashes are used as fingerprints to verify who you're talking to is who they say they are. With SSL, this function is facilitated in part by a public key infrastructure, but there are plenty of other instances in which it is useful if not necessary to verify someone's identity yourself. This is a concept called key continuity and it's quite straightforward. When you start your secure communication and you see the same hash you saw last time, then nobody has messed with anything.

Visualizing Key Continuity

In order to successfully check for key continuity, a person must be able to tell that something is out of place from what she is expecting. More importantly, she needs to care about its implications. The latter is a seriously challenging user experience design problem, but it can't be tackled before solving the former. Luckily, we can apply the simple premise that the data from a hash can be turned into a picture, and the hash above can be made to look like this:

Each block represents two bits of the symbol, converting into four shades of grey. I chose a SHA-256 hash for the demonstration because it conveniently makes a square, but there is no reason the principle couldn't be applied to different hash lengths that yield different shapes. I left it black and white because contrast is the most important factor for visual discernment of form, but I could just as easily use colour. As we can see, a picture is much easier to tell at a glance that two hashes are the same or different, and from there we can infer whatever that means.

Where Would I Use This?

If you already have a copy of the hash you're expecting, it is much easier to get the computer to do the matching job for you, such as what happens under the hood when someone much nerdier than you logs into an SSH server, or more obliquely when you visit a secure Web site. Where this technique would prove useful is in those instances where you don't have a digital representation of the original hash handy, such as the first time you connect. It would be reasonable, for instance, to print the picture of the hash onto a piece of collateral such as a business card or manual, so the person could just hold it up next to the sample to verify it's the same.

Should we even be worried?

Encrypted communication is useless without authentication, because what good is a private conversation if you can't be sure who you're having it with? That said, how do you react to, what is for the time being, such a rarely perceptible occurrence? What does it even look like?

Most of the time you would be presented with a completely different hash. More sophisticated techniques tend to take the form of generating a large number of bogus data objects until you find one that produces a hash that is similar enough to the original so that it goes unnoticed when a person only has her eyes to rely on. The attacker finds hashes that match at the beginning and the end, assuming people are looking at its numerical form, only bother to remember the first or last few digits, and take the middle for granted. In putting the hash on a graphical canvas, it is much easier to spot discrepancies, especially around the middle. And by the way, getting more than a few digits to match tends to be prohibitively expensive for the attacker.

Other Possibilities

The fundamental principle is to take an otherwise inscrutable yet distinctive blob of data and turn it into a meaningful symbol. The grid I generated is probably the most primitive depiction conceivable and is by no means fit on its own for mass consumption. There are, however, many more ways to achieve a similar result.

Chernoff Faces

The natural representation of an identity — essentially an interchangeable concept with a hash — is a face. In 1973, the statistician Herman Chernoff had the idea to use cartoon faces to represent statistical data in a way that could be quickly compared and contrasted with one another.

The problem I see with Chernoff faces is the amount of entropy they absorb, rather than reflect. The faces above, taken from R, reflect eight data points each. To use a single face to represent a SHA-256 hash would mean using 32 bits per data point, or a range with 4.3 billion steps for each facial feature. There is no way you would be able to discern from memory the features of the generated faces below the top few most significant bits. That is, you could spot a difference between 100 million and a billion but not much smaller differences than that, and that's not good enough for key continuity. Perhaps instead it would make sense to adapt the principle of Chernoff faces to entire, three-dimensional, parametrically-generated creatures similar to Will Wright's Spore. Those would have a far greater surface area over which the entropy of the hash could be distributed — and cuter, too.

Glyph, Symbol or Cartoon

In lieu of expensive and bulky 3D creatures, it may make sense to adapt the idea of Chernoff faces to some other kind of line art. I recently finished a book by neuroscientist Mark Changizi that discusses, among other things, the entropy in and perception of written language. In it he effectively gauges the object-ness of writing and plots how likely certain shapes present in symbols are to appear in the real world. With that data it might be possible to make a parametric cartoon character that is in effect a super-Chernoff face, capable of reflecting enough entropy to be useful for key continuity.

For the grownups and other serious-business environments where cartoons are a no-no, it might be feasible to use the same principle to concoct a symbol not unlike the one inflicted upon typesetters by Prince, or perhaps adapt Edward Tufte's sparklines. In Beautiful Evidence, Tufte remarks that when prompted the human eye can notice discrepancies of a tenth of a millimetre. I don't have my copy handy at the time of this writing, but I recall his claim on the effective data density of sparklines to be quite impressive.

Names and Stories

Security researcher Dan Kaminsky suggests capitalizing on our inherent capacity for names as parts of narratives to achieve the same effect for exactly the purpose of key continuity. I even riffed on the notion some time ago with regard to naming projects. He elaborates on the technique in this video from 29:15 until about 47:00. He also suggests a catchy name for this particular line of inquiry: cryptomnemonics.

Tunes

On February 20, 2010, I attended a panel featuring geneticist Jim Rupert who had an interesting anecdote about putting a stream of genetic code into a music sequencer to generate a tune. He remarked that after a while it was possible to identify particular genetic sequences based on the tunes they generated and even detect mutations when a note was misplayed. In effect, geneticists are tackling exactly the same problem as security researchers.

Synthesis

Recall that I started off talking about kids. Why not roll these ideas all together? Why not make a parametrically generated cartoon critter that had a name and hummed a tune? Every cryptographic identity would have its own unique creature associated with it. If you don't see the creature you're expecting, somebody somewhere is up to no good!