Team of Rivals: Does Science Need "Adversarial Collaboration"?

When scientists disagree about something, the two sides of the argument often come to form separate communities, with scientists collaborating with others on their "team" while avoiding working with their "opponents". But is there a better way? A paper just published today presents the results of an experiment that was conducted as an 'adversarial collaboration'. This is where some researchers sit down with some members of the "other side" and agree upon a plan for a study to test the hypothesis in question.

In this case the hypothesis was that horizontal eye movements would boost the ability to remember words. Many, but not all, previous studies have reported an effect of horizontal eye movement on memory. There's also a body of theory to explain it, but some skeptics are not convinced. This paper has six authors, all of them Dutch psychologists: three (Matzke, van Rijn and Wagenmakers) were 'skeptics', and two (Nieuwenhuis and Slagter) were 'proponents' of the effect. The fifth author, van der Molen, pitched in as an adviser and an impartial referee, but he says the whole thing went so smoothly that he didn't need to arbitrate. The team agreed on a protocol, preregistered it (here), and then ran the study. Volunteers (students) were shown a list of words and later had to write down as many words as they could remember, with a pen and paper. Immediately before the recall phase, volunteers were randomly asked to move their eyes either side to side (horizontal), or up and down (vertical), or to do nothing (no movement.) The latter two conditions were controls, expected to have no effect on recall. However, it turned out that horizontal eye movements offered no memory benefits. If anything it made memory worse:

In the discussion section, the skeptics and the proponents both got to comment separately. The skeptics argue that these negative findings are trustworthy, and they suggest that previous positive results (i.e. of an effect of eye movement on memory) may result from p-hacking. The proponents counter this argument, using a p-curve analysis to argue that p-hacking can't account for all of the positive results. They conclude by saying

Considering the empirical results and the p-curve analysis reported here, did the present adversarial collaboration resolve the disagreement between the skeptics and the proponents? No; the skeptics are probably no less skeptical, and we, the proponents, are not convinced by a single failure to replicate, especially given the results of the p-curve analysis. However, we have become more cautious about the conclusions that can be drawn from the studies reported so far, and will follow the further development of this field of research with a critical eye.

Both sides, however, praise the adversarial collaboration process, and recommend the method to others. It's not a new idea; there have been advocates of adversarial collaboration for some time, but this paper is one of the few examples of a completed adversarial study. But what did it achieve? In his summing-up, referee van der Molen says that

The adversarial collaboration could not settle the empirical debate conclusively: despite the highly diagnostic outcome of the experiment, the proponents are still convinced that the effect is real. In hindsight, this result was to be expected. A single experiment, even when pre-registered and conducted in the framework of an adversarial collaboration, may not provide sufficient evidence to overturn an opinion that was shaped over the course of many years.

It's unrealistic to expect any single paper to end a debate such as this one. We certainly shouldn't regard this collaboration as having failed just because it didn't produce unanimity. I wonder if future adversarial collaborations could encourage the participants to specify, publicly, at the outset, what kind of evidence would make them change their mind. The goal then would be to design a study that would produce enough evidence to satisfy these preregistered 'we admit defeat' conditions. Of course, even if the results did pass the threshold that they had previously stated would end the debate, researchers might still demand even more evidence. However, in this case, it would obvious that they had moved the goalposts, because the original goalposts would be a matter of public record.

