A fascinating new site called FlexibleMeasures.com reveals the enormous variety of different ways which psychologists have devised to analyse the data from the same experimental task.
The competitive reaction time task (CRTT) is widely used as a research tool to probe aggression. Participants in the task are given the chance to lash out at 'opponents' by subjecting them to annoying blasts of loud noise. Using the noise is interpreted as aggressive behaviour - but how exactly should this be quantified? German psychologist Malte Elson, of Ruhr University Bochum, created FlexibleMeasures.com to explore the many ways this question has been answered. Some researchers define aggression as the average volume of the noise inflicted - louder noise is more aggressive. Others look at the duration of the noise, and still others consider the volume multiplied by the duration. And there are many more specific methodological choices on top of these. All told, FlexibleMeasures.com lists no less than 147 published strategies for analyzing CRTT data. This is a lot, especially bearing in mind that there are only 120 published papers on the CRTT in the Flexible Measures database! There are more approaches than papers. For instance, one strategy is called 'Volume x Duration, multiplied averages of all trials (25)'. This approach has only been used in a single paper. However, although another paper by the same authors used an approach called 'Volume + Duration (sum), average of all trials (25), standardized'. On FlexibleMeasures.com, all these papers, strategies and authors are visualizable, giving rise to graphics like this one that shows the diversity of measures in papers coming from one particular research group.
Why is this diversity a problem? Because it creates the scope for p-hacking, for trying different techniques on the same data until the results come out the way the researchers want. The sheer number of approaches raises the possibility that different researchers - or indeed the same researchers at different times - have resorted to creating new analytic approaches because they didn't like the results the existing ones gave. This problem is not limited to this task. While the CRTT is currently the only task on FlexibleMeasures.com, the site and its database are set up to collate data on additional paradigms too. Malte writes that "hopefully the database will grow - collaborations are welcome!" So if you know of another worryingly flexible paradigm, get in touch with him. I asked Malte what inspired him to create the site:
I did my PhD on methodological inadequacies in research on effects of violent media on aggression, an area where the CRTT is particularly popular. Thus, I was already quite familiar with the flexible practices associated with this particular test in the aggression literature, and I thought it a simple visualization of this flexibility might be helpful to aggression researchers, and also to reviewers and authors in related domains. The site shows that flexibility appears to be the norm, and not the exception, in laboratory research on aggression that relies on this test. Aggression researchers need to change their ways if they want to provide credible answers to societal questions of great relevance... My hope is that the example of the CRTT inspires other researchers to reach out to me and use FlexibleMeasures.com's infrastructure to systemize issues related to methodological flexibility, flexibility in measurement, and diverse methods of computation in their respective area.