A new paper reports that one of the most popular approaches to analyzing fMRI data is flawed. The article, available as a preprint on arXiv, is from Swedish neuroscientists Anders Eklund et al.
Neuroskeptic readers may recall that I've blogged about Eklund et al.'s work before, first in 2012 and again earlier this year. In the previous two studies, Eklund et al. showed that the standard parametric statistical method for detecting brain activations in fMRI data is prone to false positives. The new arXiv paper has the same message, but it goes beyond the earlier studies. Whereas Eklund et al. previously showed problems in single-subject fMRI analysis, they now reveal that the same issues affect group based analyses of task-based fMRI. This is really scary, because almost all fMRI research uses group based analysis. Some people previously downplayed the importance of Eklund et al.'s work, saying that the false positive problem would only affect single subject analyses, which are rare. They hoped that the errors would "cancel out" at the group level. While that seemed plausible, it's turned out to be false. Oh dear. The new scary finding is that "parametric software’s familywise error (FWE) rates for cluster-wise inference far exceed their nominal 5% level" - in other words, the chance of getting at least one false positive result is high, much higher than the 5% level which is expected and considered acceptable. Eklund et al.'s approach was to take resting-state fMRI data and analyze it as if it were part of a task-based experiment. Because there was no task, there should have been no activations detected. They considered hundreds of variant analyses, testing the three of the most popular parametric fMRI software packages (FSL, SPM, and AFNI) and numerous different parameters for each one (such as initial cluster defining threshold, cluster extent, etc.). The vast majority of the tested parametric analyses produced too many false positive clusters. A major exception was the "FLAME1" algorithm from the FSL package, which if anything produced too few false positives i.e. it is too conservative. The cluster defining threshold was also important; the false positive problem was especially bad with a threshold of p = 0.01. A p = 0.001 threshold was much better, although still not perfect for most parametric tools, and combined with FLAME1 it was extremely conservative.
So far I've been talking about cluster-based analyses. In voxel-based group analyses, Eklund et al. found that false positive rates were much lower. Indeed, most approaches were too conservative. However, voxel-based analyses are rarely used on their own. Neuroscientists are more interested in finding clusters ("blobs"). So false positives are likely to be the major source of error in most research. The false positive clusters were not evenly spread throughout the brain. Some areas were hot-spots, with the posterior cingulate cortex being the region most prone to false positives, for all software tools. Eklund et al. say that this is probably because the fMRI images have higher local smoothness in this region, which violates the assumptions of fMRI analysis models:
The authors conclude that much of the fMRI literature may be seriously compromised, which "calls into question the validity of countless published fMRI studies based on parametric cluster-wise inference." To really rub salt into the wound, they point out that all of their analyses were corrected for multiple comparisons, yet
"40% of a sample of 241 recent fMRI papers did not report correcting for multiple comparisons"
which would make the problem even worse. Having said that, in many fMRI experiments the key question is not "is there any brain activity at all?" but "where exactly is the activity, and what modulates it?" In other words, most fMRI studies are not intended to test the null hypothesis that the brain is completely unresponsive (even if, statistically, this is part of the analysis process.) To put it another way, in many experiments the existence of a spurious blob in the posterior cingulate would not be considered important, because the focus was elsewhere. Other studies compare the magnitude of an activation across two groups, with the existence of the activation being already known. Not all fMRI studies are "blob fishing expeditions". But many are, so this is clearly a major problem. What can we do about it? Eklund et al. say that the answer is non-parametric permutation analyses. They tested these, and show that these are the only analysis methods that give the correct level of false positives (5%):
A non-parametric permutation test, for example, is based on a small number of assumptions, and has here been proven to yield more accurate results than parametric methods. The main drawback of a permutation test is the increase in computational complexity... but the increase in processing time is no longer a problem; an ordinary desktop computer can run a permutation test for neuroimaging data in less than a minute.
In other words, there is no excuse for not using them.
Anders Eklund, Thomas Nichols, & Hans Knutsson (2015). Can parametric statistical methods be trusted for fMRI based group studies? arXiv arXiv: 1511.01863v1