Recent hand-wringing over failed replications in social psychology is largely pointless, because unsuccessful experiments have no meaningful scientific value. Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator bungled something along the way [...] Targets of failed replications are justifiably upset, particularly given the inadequate basis for replicators’ extraordinary claims.
He is thus taking aim at the whole of the recent 'replication movement' in science, especially in social psychology. I applaud his taking a stand on this, and I respect Mitchell's pugnacious style. I also agree with his 'recommendations for moving forward'. However, I think he badly misses the mark. Mitchell's piece strikes me as an aggressive defense of a naive position.
Mitchell starts by bravely giving some personal examples of how the simplest errors can wreck the most beautiful scientific plans:
I have, for instance, belatedly realized that a participant was earlier run in a similar pilot version of the experiment and already knew the hypotheses; I’ve inadvertently run analyses on a dozen copies of the same set of fMRI images instead of using different data for each subject; I have written analysis code that incorrectly calculated the time of stimulus onset; and on and on. I might be embarrassed by a full accounting of my errors, except for the fact that I’m in good company - every other scientists I know has experienced the same frequent failings...
This is all too true. But Mitchell then argues that any given null result might merely result from simple mistakes like this. As a result, null findings "have no meaningful evidentiary value", and should "as a rule not be published" at all. Whereas the replication movement sees a failure to find a significant effect as evidence that the effect being investigated is non-existent, Mitchell denies this, saying that we have no way of knowing if the null result is genuine or in error: "when an experiment fails, we can only wallow in uncertainty" about what it means. But if we do find an effect, it's a different story: "we can celebrate that the phenomenon survived these all-too-frequent shortcomings [experimenter errors]." And here's the problem. Implicit in Mitchell's argument is the idea that experimenter error (or what I call 'silly mistakes') is a one-way street: errors can make positive results null, but not vice versa. Unfortunately, this is just not true. Three years ago, I wrote about these kinds of mistakes and recounted my own personal cautionary tale. Mine was a spreadsheet error, one even sillier than the examples Mitchell gave. But in my case the silly mistake created a significant finding, rather than obscuring one. There are manydocumentedcases of this happening and (scary thought) probably many others that we don't know about. Yet the existence of these errors is the fatal spanner in the works of Mitchell's whole case. If positive results can be erroneous too, if errors are (as it were) a neutral force, neither the advocates nor the skeptics of a particular claim can cry 'experimenter error!' to silence their opponents. Mitchell skirts around this contradiction, admitting that positive results can be wrong but saying that
negative evidence can never triumph over positive evidence, [but] we can always bring additional positive evidence to bear on a question... I might assert that the observer is lying, or is herself deceived. I might identify faults in her method and explain how they lead to spurious conclusions.
But in saying this, Mitchell is simply sawing off the branch he stands on. He has just been arguing that we don't even need to try to provide any evidence to show that negative results are the product of error - we can just assume it. Now he says that only hard evidence should ever convince us that a positive result isn't true. Firstly, imagine the paradox this would create if two scientists were to hold different hypotheses, and thus different criteria for 'positive' and 'negative'! Yet even assuming everyone had the same hypotheses, this would only make sense if we believe that errors (almost) always make for null results, in other words that errors are well-behaved and predictable. They're not.