Should you trust plagiarism detection software?
In my view, no — we should never treat an automated plagiarism report as definitive evidence, whether positive (as proof of plagiarism) or negative (as proof of innocence.) These tools are useful for rapidly screening texts to raise red flags, but once a suspicion is raised, only old-fashioned manual checking can determine originality or otherwise.
In this post I’ll explain why — but first, a little backstory.
Five months ago, I argued that certain materials published by a new British ‘research ethics organization’, called PIE, contained similarities to other, uncited sources. For more on PIE, see these posts.
Shortly after I posted, PIE put up a “Disclaimer“. In the past week they’ve gone on the defensive again with a blog post which, while not naming me, is clearly aimed in my direction. (The comment thread is quite entertaining.)
This is where those plagiarism detectors come in. In their “Disclaimer”, echoed in the blog post, PIE report that all of their text is rated as original by two automated plagiarism checkers: Grammarly and IThenticate.
I have no doubt that that’s true, but it doesn’t impress me much. Most of these detectors rely on spotting strings of text that are identical between two sources. So they can pick up naked copy and pasting, but they can be fooled quite easily.
All a hypothetical plagiarist needs to do, to evade such software, is to make sure that no more than, say, any given three or four consecutive words are identical to their source. So they can copy and paste, so long as they, let’s say, change the word order a bit, add or remove some filler words like ‘the’, ‘and’, ‘but’, and replace a few words with synonyms. I call this text laundering.
To show how easily text could hypothetically be laundered, I took some of PIE’s own text (from here)
PIE Original: You are invited to join the Publication Integrity and Ethics (herein referred to as PIE) as one of its founding members. PIE, a not-for profit organisation, offers free membership to all interested individuals. Please join us and become part of this exciting new movement in the world of publishing ethics; it is the professional home for authors, reviewers, editorial board members and editors-in-chief.
Now let’s copy, paste, wash and rinse …
Neuroskeptic: You are invited to join Publication Integrity and Ethics (herein referred to as PIE) and become one of its founding members. PIE, a not-for profit organisation, offers interested individuals free membership. Please join this exciting new movement in the publishing ethics world; PIE is the professional home for reviewers, editorial board members, authors, and editors-in-chief.
If that’s not plagiarism, I don’t know what is. But Grammarly’s verdict? “The text in this document is original.”
Importantly, Grammarly does ring the plagiarism alarm if you enter PIE’s original text. This proves that PIE’s website is part of Grammarly’s database of sources. So the software should have detected my ‘plagiarism’. But it didn’t. This is why I always take these tools with a pinch of salt, and why I’m not impressed by PIE’s Disclaimer (although please note – I have never accused PIE of ‘plagiarism’. They introduced that word into this discussion, not I. I just talk about similarities.)
There are other lessons to learn from this saga. Consider, for instance, that a few days ago, PIE released an new bit of their disclaimer, “Examined Documents“. They now say that
Our authors have examined several documents at the time of writing the contents of the Publication Integrity & Ethics [PIE] website and its guidelines. Hence, it is natural that we include the list of these documents as our references. Please see the list.
It is indeed ‘natural’ for authors to reference their sources, but it seems that for the first few months of the site’s existence, they didn’t do so. Which I guess made them … unnatural?
Anyway, the list vindicates what I said in my very first PIE post: I said that some of PIE’s content was similar to the Australian Press Council’s newspaper guidelines, and publisher Elsevier’s editorial policies — and they now reference both of those sources. If they’d only done that from the start, I wouldn’t have written my post.
The lesson here? Acknowledge your sources from the start. Because the longer you leave it, the worse your eventual climbdown will look.
But something is conspicuous by its absence from PIE’s reference list: any mention of the Committee on Publication Ethics (COPE) guidelines. Yet as I said previously, several areas of PIE’s work appear similar to COPE’s .
Consider the PIE Peer Reviewer Guidelines. The last bit of PIE’s document (parts 7.1-8.5) consists of 16 points. In my estimation, the same ideas all appear in a section of COPE’s Guidelines for Peer Reviewers. The wording differs somewhat (though in many cases, only slightly), but the content is essentially the same.
Crucially, the 16 ideas appear in exactly the same order in both documents — despite the fact that the COPE document also contains additional statements with no PIE equivalent, interspersed among the ones that are similar. If we designate the PIE statements in order as A-P, we find that the COPE equivalents also appear in the order A-P.
How odd. Perhaps it’s just a coincidence. How much of a coincidence? Well, to calculate the number of possible ways to order a given number of items (permutations), we need the mathematical factorial function, written as X! There are X! ways to order X items. 16! = 2.09*10^13 so there are about 20 trillion unique orderings of those 16 items.
So it’s quite a big coincidence, then. Why might PIE not want to credit COPE? We can but speculate. Perhaps the fact that PIE seems to be in direct competition to COPE might be relevant: they’re both organizations with “Publication” and “Ethics” in the title, who offer a set of best-practice guidelines for academics and academic publishers. Although COPE has been around for 17 years not 5 months.
It might be embarrassing to admit a debt to ones rivals… but it’s more embarrassing not to admit it. So this is the final lesson here: there’s nothing wrong with being influenced by your predecessors: no-one will care, if you’re transparent about it.