Why We Care About Plagiarism

I was occupied over the last several weeks, as I chronicled elsewhere, and doubly so with travel over the holidays, so I missed most of the drama around the coordinated right-wing assault against the presidents of both Penn and Harvard (and MIT, but she seems to be hanging on), effectively shepherded into voicing an insufficiently full-throated support of Israel's ongoing prosecution of its war in Gaza during a US congressional hearing. This performance was enough to take down Penn's president Liz Magill, but not Claudine Gay, the president of Harvard. As a result, the ratfuckers switched tack and went after her scholarship.

Unsatisfied with the outcome of the congressional hearing, the would-be character assassins, figureheaded by one Christopher Rufo, went digging through Dr. Gay's publications, looking for material to support a narrative that she (a Black woman, if you had somehow managed to get through the last month without encountering that fact) held her position for reasons other than merit. The most substantial evidence they found was a paragraph in her PhD dissertation which was lifted from a paper by a professor and colleague in her doctoral program (and which inverts the operative verb) without proper attribution.

An entire paragraph which is mostly word-for-word identical to a paragraph in another person's work, without quotation marks around it, is unequivocally bad optics—especially when the offending document is the doctoral dissertation of the leader of the most august academic institution on the continent. It is also a glaring hypocrisy: an affront to every student who has ever had their knuckles rapped—or worse—for doing the same. If you look at the actual content of the text which one of the authors of the original avowed was technically plagiarized (a detail Gay's opponents gleefully latched onto, but not before the customary excision from its context) you will notice it is a piece of pure expositional boilerplate. It has the tone and rhetorical weight of an instruction manual for a dishwasher—Stephen Voss, a coauthor of the original paper, basically said as much.

Here is where I enter with my motivation for entertaining this subject. About twelve years ago, I wrote on my personal website what I consider to be a B-minus meditation on how a preoccupation with efficiency was a suboptimal frame for the quaternary sector, because results tended to be stochastic and fat-tailed—meaning that both the value and the timing of outcomes were random. Since there is no meaningful way to make a process like that more efficient, perhaps we, as information-sector professionals, should stop obsessing over efficiency, and focus on other values instead. Three months later, I get messages from no fewer than three different people—unbeknownst to each other—notifying me that a version of my article, with names and details changed, had just been published in an industry magazine by an aspiring thought leader.

My partner at the time was an instructor at a private college that mainly focused on university prep for international students, meaning that she dealt mainly with kids from all over the world that were somewhere in between high-school-aged and freshman. It was her Sisyphean task to drill into her students the importance of quoting their sources, because, as she would tell them (I paraphrase), I'm interested in your ideas, and if you don't quote your sourced text, I can't tell which ideas are yours. More to the point, though, not doing so contravened the school's academic code—as it presumably does with all of them—and would result in disciplinary action. She was the one who took it upon herself to reach out to this publication that they had published an article that had been plagiarized, and their response was (something on the order of) while there are similarities in both the structure and overarching argument, the article they published can't be plagiarism, because it doesn't reproduce any passages verbatim.

This is why I felt impelled to write about this event: this would-be thought leader (at least, convincingly enough to those who brought the matter to my attention) used my article as scaffolding to appear clever and worldly in a trade publication (while diluting the message into a platitude fit for a motivational poster), while Claudine Gay recycled a few lines of text that amount to no discernible argument—not even so much as a witty turn of phrase—at all. One of these acts is an inexcusable crime of plagiarism, while the other is ostensibly fair game.

I want to challenge the notion that sequences of words are the only things that can be plagiarized. Rather, there are semantic constructs all up and down the scope of a work that are amenable to copying. Consider—mainly because my brain is insisting on using this as an example and can't think of any others—Akira Kurosawa's film Ran, which is more or less Shakespeare's King Lear transposed to feudal Japan. Kurosawa did not try to claim otherwise (though interestingly, he learned about and read the play only after he had started, and drove his project in its direction), but imagine if he did. Shakespeare? Never read him. Total coincidence. Who would buy that?

Coincidences happen, and it's not impossible that two individuals can come up with the same informational structure with no prior contact. (This, after all, is what Kauffman's Adjacent Possible is about.) I'm going to assert, though, that the explanatory power of coincidence drops off the more overlapping detail there is. Words, for instance, are concrete symbolic objects arranged in a definite topological structure, and plus there are a lot of them, so the chance that you're going match more than a few in a row in any two documents at random is going to be pretty slim—and slimmer the more words you add to the sequence. If you get rid of filler words and abstract out to what the WordNet people call synsets (which are what they sound like: synonyms grouped under an identifier), you could probably match longer strings that are still defensible as coincidental. This is because the way language works is that you have to use roughly the same words in roughly the same order if you want to communicate roughly the same things, so this isn't that surprising. Indeed, if you're trying to write something like a legal or scientific document, there's going to be a sort of paint-into-a-corner effect where you have to use certain syntactical constructs to say what you're trying to say, because your hyperspecialized language won't have any synonyms.

The plagiarism question arises when a work contains features found in an earlier work arranged in a configuration too implausible to be a coincidence. Word sequences are an easy target because everybody agrees on what a word is. If you wanted to claim plagiarism of some other construct, you'd need to analyze the situation using some kind of surrogate model. Then, of course, you can have an argument about whether the model is valid, or, if you're a dodgy trade publication, you can just decline to hear the evidence.

I'm not terribly interested in the psychology of plagiarism—not being a fan of remote psychoanalysis in general—but it should be pretty uncontroversial to surmise that a plagiarist necessarily recognizes some instrumentality in the original work, and figures that nobody will notice they copied it (or otherwise that copying is legitimate). As for not proactively citing the source, maybe they don't believe the original author deserves the credit; maybe they're concerned their take will look weaker in comparison. Or, perhaps they judge that the content itself is unworthy of a cite. It really is quite remarkable just how dull the passages Claudine Gay is accused of lifting are. It's almost as if she pasted them in with the intent of changing them into her own language—which she did, here and there—but ultimately decided she couldn't be bothered.

While it's poor scholastic hygiene, terrible optics, and a galling hypocrisy, what the plagiarized material is not—at least to these admittedly untrained eyeballs—on account of its stultifying blahness, is evidence that Claudine Gay doesn't know what she's talking about. Nowhere does it try to be smart, or clever, or interesting; it's the kind of boring boilerplate one would nowadays consign to ChatGPT. Contrast this with one Neri Oxman.

In a somewhat predictable twist, because hedge fund billionaire✱ Bill Ackman, who was agitating for the removal of Gay (and threatening to rescind a large donation to Harvard's endowment over the matter), happens to be married to Oxman (a MIT Media Lab/TED/Edge/Epstein alum), her academic record found itself facing scrutiny. And what a find! Wikipedia! Her own students! Paragraph after paragraph unceremoniously cut and pasted! Here is somebody who has very much made a career as a showperson, and unlike Claudine Gay, the quantity—and critically, the content—of the plagiarism makes me immediately wonder aloud if putting on a slick performance is all Oxman knows how to do.

This, I propose, is one leg of why we really care about plagiarism. If you produce some sort of idea, or insight, or synthesis, or explanation, or argument, or story, that's a sign of your brain working; you understanding something. If you copy somebody else's, where's the evidence for you? How do we know you're competent? (Again, what makes the Gay plagiarism so remarkable is that it contains no discernible insights.) The other leg one might be inclined to describe in terms of intellectual property, but I think it's subtler than that. When you copy somebody's work (again, idea, insight, etc.) without attribution, you're diverting attention and accolades to yourself that would otherwise have gone to them. So in addition to harming the true progenitor and defrauding your audience about your own capabilities, accomplishments, and potential, you're also depriving them of access to the genuine article.

Coda: Gamergate II

Taking the actual, usually fairly innocuous conduct of people you don't like and having a public conniption about ethics is unambiguously the playbook of GamerGate. The difference is that while the original GamerGate was quasi-organic, this time around it's premeditated. Rufo himself announced in advance he would be doing this, and then crowed about his victory over Gay in the Wall Street Journal and said to expect more of the same. Institutional leaders have had a decade to prepare for this brand of onslaught, but they don't appear to have done much of it. I bet they're scrambling now.

Again, the GamerGate strategy is to take something that you actually did, no matter how banal, and make a huge public fuss about it. These people will make a parking ticket sound like a double homicide. All they care about is scoring the hit; they aren't bothered if you don't deserve it, and have no compunction about ruining your career or your life. You also can't hit them back in any meaningful way, because they're all avowed scumbags—it's what they call mudwrestling a pig.

The way you protect yourself from gamergaters is not to have a spotless record—because they will keep digging until they find even the faintest blemish—but rather to make the spots not matter. Something Harvard potentially could have done to at least partially neutralize these attacks—I'm spitballing here—is stage a technically-plagiarism jubilee—that is, a one-time amnesty for people who have copied boring bits of academic boilerplate without attribution: technically plagiarism, but not of the species that defrauds the audience or usurps any great idea or effort. As a mutual on the socialnets said, it was a teaching moment, and Harvard failed.