<?xml version="1.0"?>
<?xml-stylesheet href="/transform" type="text/xsl"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:bs="http://purl.org/ontology/bibo/status/" xmlns:ci="https://vocab.methodandstructure.com/content-inventory#" xmlns:dct="http://purl.org/dc/terms/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xhv="http://www.w3.org/1999/xhtml/vocab#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" lang="en" prefix="bibo: http://purl.org/ontology/bibo/ bs: http://purl.org/ontology/bibo/status/ ci: https://vocab.methodandstructure.com/content-inventory# dct: http://purl.org/dc/terms/ foaf: http://xmlns.com/foaf/0.1/ rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# xhv: http://www.w3.org/1999/xhtml/vocab# xsd: http://www.w3.org/2001/XMLSchema#" vocab="http://www.w3.org/1999/xhtml/vocab#" xml:lang="en">
  <head>
    <title property="dct:title">RDF-KV</title>
    <base href="https://doriantaylor.com/rdf-kv"/>
    <link href="document-stats#EaUIbIGZm-kX3NSALVqk_K" rev="ci:document"/>
    <link href="elsewhere" rel="alternate bookmark" title="Elsewhere"/>
    <link href="this-site" rel="alternate index" title="This Site"/>
    <link href="http://purl.org/ontology/bibo/status/draft" rel="bibo:status"/>
    <link href="http://purl.org/ontology/bibo/status/published" rel="bibo:status"/>
    <link href="" rel="ci:canonical" title="RDF-KV"/>
    <link href="lexicon/#EzqXIsriaILFcWjXdS7FbI" rel="dct:audience" title="Software Developer"/>
    <link href="person/dorian-taylor#me" rel="dct:creator" title="Dorian Taylor"/>
    <link href="person/dorian-taylor" rel="meta" title="Who I Am"/>
    <link about="./" href="3f36c30c-6096-454a-8a22-c062100ae41f" rel="alternate" type="application/atom+xml"/>
    <link about="./" href="f07f5044-01bc-472d-9079-9b07771b731c" rel="alternate" type="application/atom+xml"/>
    <link about="./" href="this-site" rel="alternate"/>
    <link about="./" href="elsewhere" rel="alternate"/>
    <link about="./" href="e341ca62-0387-4cea-b69a-cdabc7656871" rel="alternate" type="application/atom+xml"/>
    <link about="verso/" href="3f36c30c-6096-454a-8a22-c062100ae41f" rel="alternate" type="application/atom+xml"/>
    <link about="verso/" href="this-site" rel="alternate"/>
    <link about="verso/" href="elsewhere" rel="alternate"/>
    <meta content="rdf-kv" datatype="xsd:token" property="ci:canonical-slug"/>
    <meta content="This is a draft of a protocol I designed for embedding RDF statements in plain HTML forms, enabling quick-and-dirty Semantic Web applications." name="description" property="dct:abstract"/>
    <meta content="2013-09-17T03:48:17+00:00" datatype="xsd:dateTime" property="dct:created"/>
    <meta content="2013-10-08T19:31:38+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2013-10-18T03:01:10+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2013-10-20T23:57:57+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2013-11-07T16:15:22+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2013-11-10T18:01:27+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2022-05-31T04:18:52+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta content="2022-05-31T15:10:50+00:00" datatype="xsd:dateTime" property="dct:modified"/>
    <meta about="person/dorian-taylor#me" content="Dorian Taylor" name="author" property="foaf:name"/>
    <meta content="summary" name="twitter:card"/>
    <meta content="@doriantaylor" name="twitter:site"/>
    <meta content="RDF-KV" name="twitter:title"/>
    <meta content="This is a draft of a protocol I designed for embedding RDF statements in plain HTML forms, enabling quick-and-dirty Semantic Web applications." name="twitter:description"/>
    <object>
      <nav>
        <ul>
          <li>
            <a href="//dorian.substack.com/p/the-nerden-of-dorking-paths" rev="dct:references" typeof="bibo:Article">
              <span property="dct:title">The Nerden of Dorking Paths</span>
            </a>
          </li>
          <li>
            <a href="the-symbol-management-problem" rev="dct:references" typeof="bibo:Article">
              <span property="dct:title">The Symbol Management Problem</span>
            </a>
          </li>
          <li>
            <a href="document-stats#EaUIbIGZm-kX3NSALVqk_K" rev="ci:document" typeof="qb:Observation">
              <span>urn:uuid:69421b20-6666-4fa4-a5f7-35200b56a93f</span>
            </a>
          </li>
        </ul>
      </nav>
    </object>
  </head>
  <body about="" id="E2HHOlJxMeRci67E-y2DsJ" typeof="bibo:Specification">
    <blockquote id="EJcsIhPc2wOMbSpyp3g5hI">
      <p>This is a <em>very early draft</em>&#x2014;more like running notes, of a protocol for embedding RDF data into plain-jane, no-JavaScript, your-grandma's HTML.</p>
    </blockquote>
    <h2>Rationale</h2>
    <p>Just about any data you could express in a web form can also be expressed in RDF.</p>
    <p>As such, it makes sense to write web apps that natively speak RDF in order to take advantage of the vast library of vocabularies, inference, validation, etc.</p>
    <p>Only problem is then you'd need some kind of JavaScript contraption to send RDF (Turtle, JSON-LD, whatever) to the server for any browser-based application, which is inherently brittle, and extra complexity to debug.</p>
    <p>Solution: create a way to express RDF using boring old web forms, and then put a filter on the server that turns conforming <samp>application/x-www-form-urlencoded</samp> request content into RDF before processing.</p>
    <h2>Requirements</h2>
    <ul>
      <li>Must yield valid HTML <strong>4</strong> because 5 is no longer deterministic</li>
      <li>Grammar must be regular so it can be parsed easily with a regex</li>
      <li>Must be able to be typed by hand to facilitate slapdash prototypes</li>
      <li>Must assume input is already parsed, and not rely on the order of form inputs (unlike <a href="http://www.lsrn.org/semweb/rdfpost.html" rel="dct:references">RDF/POST</a>)</li>
      <li>Must not depend on inference, stored prefixes, etc, though another part of the implementation certainly could use them</li>
      <li>Must ignore input that doesn't match the protocol</li>
      <li>Should, however, raise an exception on malformed attempts to match the protocol</li>
      <li>Must "fail open", i.e. not do stupid or destructive stuff if malformed</li>
      <li>Must not be too chatty, i.e. be as succinct as possible; use the fewest bytes to express semantics</li>
    </ul>
    <h2>Basic Syntax</h2>
    <p>The <samp>form</samp> element's <samp>action</samp> URI is the subject, the <samp>input</samp> elements' <samp>name</samp>s are predicates, and their <samp>value</samp>s are the objects.</p>
    <pre style="font-size: 75%">&lt;form method="POST" action="http://example.com/my/resource"&gt;
  &lt;input type="text" name="http://purl.org/dc/terms/title"/&gt;
  &lt;button&gt;Set the Title&lt;/button&gt;
&lt;/form&gt;</pre>
    <p>will produce:</p>
    <pre style="font-size: 75%">&lt;http://example.com/my/resource&gt; dct:title "Whatever you wrote" .</pre>
    <p>If you aren't aware, the <acronym title="Document type definition">DTD</acronym> attribute type for <samp>name</samp> is and always has been <samp>CDATA</samp>, which means it can be any non-empty string. This is great for creating triples with plain literal values, but the addresses of resources could also be inferred from <samp>rdfs:range</samp> properties in whatever schema a given predicate belongs to.</p>
    <h3>Resources and Blank Node Identifiers</h3>
    <p>In lieu of such inference, however, we can supply the following:</p>
    <pre style="font-size: 75%">&lt;input name="http://www.w3.org/1999/02/22-rdf-syntax-ns#type :"/&gt;</pre>
    <p>The colon <samp>:</samp> at the end of the name signifies that the <samp>input</samp>'s value should be treated as a resource. If we want the <samp>input</samp>'s value to represent a blank node identifier, we use the underscore character <samp>_</samp> instead.</p>
    <h3>Literals, Languages and Data Types</h3>
    <p>Even though the default behaviour is to treat <samp>input</samp> <samp>value</samp>s as plain literals, there are language-tagged and typed literals to consider as well. We encode these by adapting a similar syntax to Turtle:</p>
    <pre style="font-size: 75%">&lt;input name="http://purl.org/dc/terms/description @en"/&gt;
&lt;input name="http://purl.org/dc/terms/created
             ^http://www.w3.org/2001/XMLSchema#date"/&gt;</pre>
    <p>Here, the aforementioned two <samp>input</samp>s prescribe the language and datatype of their respective values. Note that a literal can only have a language <em>or</em> a datatype, not both. For the sake of completeness, although it likely won't come up often in practice, the character to disambiguate plain literals is the apostrophe <samp>'</samp>.</p>
    <h3>Subject</h3>
    <p>In the case you need to specify a different subject, simply prepend it to the predicate.</p>
    <pre style="font-size: 75%">&lt;form method="POST" action="http://example.com/my/resource"&gt;
  &lt;input type="text" name="http://example.com/other/resource
                           http://purl.org/dc/terms/title"/&gt;
  &lt;button&gt;Set the Title&lt;/button&gt;
&lt;/form&gt;</pre>
    <h3>Graph</h3>
    <p>If you need to specify a graph other than the default for an individual statement, put the graph's URI after the object designator.</p>
    <pre style="font-size: 75%">&lt;input name="http://purl.org/dc/terms/title ' http://example.com/my/graph"/&gt;</pre>
    <p>This would be a rare instance in which you would encounter the need for the <samp>'</samp> designator, which you can naturally omit if you also specify a subject:</p>
    <pre style="font-size: 75%">&lt;input name="http://example.com/other/resource
             http://purl.org/dc/terms/title
             http://example.com/my/graph"/&gt;</pre>
    <h3>Statement Reversal</h3>
    <p>The condition often arises that we wish to specify the <samp>input</samp>'s value as a <em>subject</em> rather than an <em>object</em>. To accommodate this, put a bang <samp>!</samp> at the front of the field name:</p>
    <pre style="font-size: 75%">&lt;input name="! http://purl.org/dc/terms/creator"/&gt;</pre>
    <p>This changes the direction of the statement, so subjects become objects and objects become subjects. Note that you can only specify URIs or blank nodes this way. If you want to use a reverse statement with a literal, use a placeholder.</p>
    <h3>Add/Subtract</h3>
    <p>The default behaviour is to merge relevant resources with the contents of the form, but if you want to delete statements, prepend with a <samp>-</samp>. <samp>+</samp> is a no-op for the default behaviour.</p>
    <pre style="font-size: 75%">&lt;input name="- dct:title"/&gt;</pre>
    <p>Also consider = for "nuke all subject-predicate pairs of this kind and replace them with this value"</p>
    <pre style="font-size: 75%">&lt;input name="= dct:title"/&gt;</pre>
    <h2>Control Words</h2>
    <p>because you will invariably want to change the global behaviour: $PREFIX etc</p>
    <dl>
      <dt>SUBJECT</dt>
      <dd>the default subject instead of the form's action</dd>
      <dt>GRAPH</dt>
      <dd>override the default graph</dd>
      <dt>PREFIX</dt>
      <dd>override namespace prefix declarations</dd>
      <dt>TARGET</dt>
      <dd>redirect to some other address (dunno if this is smart or dumb)</dd>
    </dl>
    <h2>Abbreviation</h2>
    <p>typing in full URIs sucks</p>
    <p>prefixes/CURIEs, duh</p>
    <p>ok but consider the situation where the entry exists but the prefix isn't registered (then you get garbage data)</p>
    <p>or the prefix collides with a URI scheme (like http) (then you get MORE garbage data)</p>
<p>so here's a question: do we do "variables" as in proper variables, or do we do "macros" as in dumb text substitution?</p>
<h2>ok how about this bnf</h2>
<pre style="font-size: 75%">rdf-kv ::= partial-statement | declaration
declaration ::= '$' WS NCName (WS '$')?
macro ::= '$' NCName | '${' NCName '}'
term ::= IRI | CURIE | macro
partial-statement ::= (modifier WS)? (term (WS term)? (WS designator)? |
    term WS designator WS term |
    term WS term (WS designator)? WS term) (WS '$')?
</pre>
<h2>Validation</h2>
<p>Obviously you could use RDFS/OWL/XSD to validate form contents. but there is problem re: tracing them back to the original inputs in case they're invalid. Basically you will need to keep track of the original, bit-for-bit, verbatim form keys <em>and</em> values in document order (the order of the keys not so important but being verbatim is). Then it should be a matter of devising a response body that contains enough information to stitch the offending form controls back together.</p>
<aside role="note" id="EuyVpn13v-qSaAWMCae0UJ">
  <p>Note: have fun with tracing results from macro expansions.</p>
</aside>
<h2>About Those Macro Expansions</h2>

<p>The macros in RDF-KV are basic string-replacements &#xE0; la shell variables.</p>
<p>Note that the form designer should endeavour to hide this macro business from end users. It is for me, not for them.</p>
<p>The reason we need this type of system in the first place is because the menu of HTML (pre-5, and arguably 5 as well) form controls is pretty bleak for the kind of information real human beings actually tend to need to manipulate (remember, the ultimate objective is to make this easier for people).</p>
<p>Take dates, for example, often separated into three <code>&lt;select&gt;</code> boxes for year, month and day, in lieu of a sane alternative. There has to be <em>some</em> mechanism for concatenating those values together, and if the product of this protocol is a set of RDF statements that require no further manipulation, those values have to be concatenated in transit.</p>
<pre style="font-size: 75%">&lt;select name="$ y"&gt;
  &lt;option value="2013"&gt;2013&lt;/option&gt;
  &lt;!-- and so on --&gt;
&lt;/select&gt;
&lt;select name="$ m"&gt;
  &lt;option value="01"&gt;January&lt;/option&gt;
  &lt;!-- et cetera --&gt;
&lt;/select&gt;
&lt;select name="$ d"&gt;
  &lt;option value="01"&gt;1&lt;/option&gt;
  &lt;!-- und so weiter --&gt;
&lt;/select&gt;
&lt;input type="hidden" name="dct:created ^xsd:date $" value="$y-$m-$d"/&gt;</pre>
<h3>Conditional Expansion</h3>
<p>You might have noticed the <samp>$</samp> terminating the statement template. It is an explicit signal to expand macros in the statement's value. Macro expansion is off for values (of either statements or macro declarations) by default, in case the end user happens to accidentally type one in. However, macro expansion is <em>always</em> on for the statement templates.</p>
<h3>Empty Values</h3>
<p>Empty values are going to have to have a different meaning for macros than statement templates. Namely, you can't discard the declaration because it's empty, since it might be used later on. <em>But</em>, if a macro declaration has multiple values, where one or more values are empty and at least one isn't, the empty ones should be discarded.</p>
<h3>Multiple Values</h3>
<p>These are web forms and <samp>application/x-www-form-urlencoded</samp> carrying this data, so that means macros can be defined more than once. What does that mean for variable substitution? I'm thinking the behaviour that would ultimately be the least surprising would be Cartesian product. (Not surprising in the logical sense, but potentially <em>very</em> surprising in the engineering sense!)</p>
<p>Look at it from the perspective of the user (in this case, the form designer), who is designing for <em>their</em> user, who they want to do as little work filling out forms as can be gotten away with. Enter one value in a slot and use it multiple places, enter two values and it should generate two statements.</p>
<p>The problem with the Cartesian product of N sets is that it can get really big, really fast.</p>
<p>I'm going to permit recursive expansion in macro declarations, because it would be lame if I didn't.</p>
<aside role="note" id="E6llXHSSfEYHMySFUmjtbJ">
  <p>Just be sure to account for cycles in the implementation!</p>
</aside>
<p>I'm not going to permit macro expansion <em>at all</em> in the names of the macros themselves, because that is just crayzo.</p>
<p>For statement values, there is no need for recursion, but consider the interaction of having multiple, multiple-valued macros in the same value: Cartesian product.</p>
<pre style="font-size: 75%">&lt;!-- imagine both $first and $last each have 10 first/last names --&gt;

&lt;input type="hidden" name="dct:contributor $" value="$first $last"/&gt;

&lt;!-- you're looking at 100 (meaningless) statements getting generated --&gt;
</pre>
<p>The statement <em>templates</em> are where things get interesting. They would behave the same way as the values do, but they would multiply the number of statements produced even higher. We're talking about a Cartesian product (statements) of a Cartesian product (values) of a Cartesian product (macros). Immediately that situation brings to mind denial-of-resource attacks where a tiny message explodes into a crippling logic bomb. <em>But</em>, such a device would have characteristically low initial entropy, which is amenable to detection heuristics: Essentially, any enormous number of RDF statements generated in this fashion are simply not going to be very interesting, and would therefore be immediately suspect, and the process of inflating them can be shut down long before the set gets too big.</p>
<h3>Unbound Macros</h3>
<p>suppose you reference a macro that was never declared. what happens?</p>
<dl>
  <dt>ignore it (leave the <samp>$symbol</samp> reference alone)</dt>
  <dd>pros: doesn't screw with input beyond defined macros, can use literal <samp>$</samp> characters; cons: will create garbage data if there is an error in the form.</dd>
  <dt>raise an error</dt>
  <dd>pros: informative to the form designer, who is really the intended user of macros; cons: might blow the end user up unexpectedly, plus you'll have to pull some crap to get literal <samp>$</samp> chars into the form values if you want 'em (e.g. by making a non-expanding macro that contains a dollar sign).</dd>
  <dt>replace it with the empty string</dt>
  <dd>pros: consistent with the way it works in Bourne etc shells; cons: fails silently and produces garbage data.</dd>
</dl>
<p>currently leaning toward leaving it alone</p>
<h3><em>Gimme</em>/"Environment" Variables</h3>
<p>The server should never trust the client.</p>
<p>This is an actual use case I'm interested in: I want to use a this protocol to make a complex RDF structure, and I want the subjects to be UUID URNs (a technique I use religiously for canonical URIs and/or ones I haven't decided on what to name yet). I want to make sure I pick UUIDs that don't collide with ones already in the database, lest I corrupt a bunch of existing data.</p>
<p>But, you say, if the UUIDs are generated in the standard fashion, the likelihood of that happening is infinitesimal. Indeed that's how they were designed. Sure, but consider boneheaded scenarios where one is hard-coded, or left behind from some other process, and so on. Better yet: what about if somebody is up to no good and knows the URI (UUID or not) and slips some harmful statements into the form? Fine. Throw an ACL on the target. But what about the error message (for the benign user whose form submission incomprehensibly doesn't work)? Best to just let the server generate any necessary new identifiers. Consider:</p>
<pre style="font-size: 75%">&lt;input type="hidden" name="$ new $" value="urn:uuid:$NEW_UUID"/&gt;</pre>
<p>I am well aware this kind of functionality can get out of hand.</p>
<h2>Security Considerations</h2>
<p>Oh boy! You mean besides the ones already considered?</p>
<p>If it isn't already evident, this protocol should <em>only</em> be used to <code>POST</code> standard <samp>application/x-www-form-urlencoded</samp> HTML forms. It would make a complete mess of the query string if it was used with <code>GET</code>. Also, at the time of this writing, I have no idea how you would reconcile this protocol with file uploads.</p>
<p>Since that nasty FTP URL trick was plugged, I'm not sure how you get browsers to <code>POST</code> across domains without JavaScript, so I think we're safe there.</p>
<p>Pretty certain all subjects mentioned in the form, unless they're brand new, should be topologically connected somehow to the resource to which the form was POSTed. Certainly if you're going to be making any destructive changes. We can imagine this lending itself to an escalation attack where the attacker connects two unconnected resources together on order to mess with one of them in a later request. Actually we can imagine a lot of things, so it's probably best to get this thing working so we can figure out all the glorious ways we can break it.</p>
<p>Also, this is going to have to go for <samp>$SUBJECT</samp>, which is the override for the form action URI.</p>
    <h2>Implementation</h2>
    <ul>
      <li><a href="http://search.cpan.org/~dorian/RDF-KV/lib/RDF/KV.pm" rel="dct:references">Perl, because I need that right now.</a></li>
      <li>Python, because lots of people (including me) use that.</li>
      <li>Some kind of JavaScript/JQuery adapter, because that would be useful for progressive enhancement.</li>
      <li>Others? FYPM.</li>
    </ul>
    <h2>Future Directions</h2>
    <ul>
    <li>Shorthand for rdf:List, Seq, Bag, Alt?</li>
    <li>some functionality for doing basic lookups in the existing graph?</li>
    </ul>
</body>
</html>
