A paper in PNAS got some attention on Twitter recently. It's called Childhood trauma history is linked to abnormal brain connectivity in major depression and in it, the authors Yu et al. report finding (as per the Significance Statement):

A dramatic primary association of brain resting-state network (RSN) connectivity abnormalities with a history of childhood trauma in major depressive disorder (MDD).

The authors go on to note that even though "the brain imaging took place decades after trauma occurrence, the scar of prior trauma was evident in functional dysconnectivity." Now, I think that this talk of dramatic scarring is overblown, but in this case there's also a wider issue with the use of a statistical method which easily lends itself to misleading interpretations — canonical correlation analysis (CCA).

First, we'll look at what Yu et al. did. In a sample of 189 unmedicated patients with depression, Yu et al. measured the resting-state functional connectivity of the brain using fMRI. They then analyzed this to give a total of 55 connection strengths for each individual. Each of these 55 measures reflects the functional coupling between two brain networks.

For each patient, Yu et al. also administered questionnaires measuring personality, depression and anxiety symptoms, and history of trauma. These measures were then compressed into 4 clinical clusters, (i) anxious misery (ii) positive traits (iii) physical and emotional neglect or abuse, and (iv) sexual abuse.

This is where the CCA comes in. CCA is a method for extracting statistical associations between two sets of variables. Here one set was the 55 brain connectivity measures, and the other was the 4 clinical clusters. Yu et al.'s CCA revealed a single, strong association (or 'mode of variation') between the two variable sets:

A correlation coefficient of 0.68 is very large for a study of a brain-behaviour relationship. Normally, this kind of result would certainly justify the term "dramatic association".

But the result isn't as impressive as it seems, because it's a CCA result. CCA is guaranteed to find the best possible correlation between two sets of variables, essentially by combining the variables (via a weighted sum) in whatever way maximizes the correlation coefficient. In other words, it is guaranteed to over-fit and over-estimate the association.

Yu et al. show this, as they found that using a permutation procedure (which eliminates any true associations) the CCA still produced a mean correlation coefficient of r=0.55. In 5% of cases, the CCA was lucky enough to hit r=0.62 or higher. Remember that the 'true' correlation is zero in this case! CCA is able to magic up a strong correlation of 0.55 or higher from out of thin air.

The observed correlation of r=0.68 is statistically significant, because it's higher than the 95% null of 0.62, but it's not much higher. In other words, while there does seem to be some true relationship between the brain and behavior variables here, it is almost certainly much weaker than it appears.

(Yu et al. in their paper also carried out a comparison of depressed patients to healthy controls, which does not rely on CCA, and which I'm not discussing here.)

So what is the use of CCA, if it is guaranteed to overfit the data? Well, it can be useful so long as you have two (or more) independent datasets, allowing you to test the validity of the CCA model, derived from one dataset, in another. The CCA would be overfitted to the first dataset, but by testing it in the second dataset, we can know how much of the correlation is real.

Unfortunately, Yu et al. is not the only paper to adopt a single-sample CCA approach. A well-cited paper Smith et al. (2015) in Nature Neuroscience, which Yu et al. refer to several times, did the same thing. (I blogged about it at the time, rather un-skeptically).

Smith et al. compared brain functional connectivity to behaviour and lifestyle variables, and found a mode of CCA variation with a spectacularly strong correlation of r=0.8723. But the 95% significance threshold under the permuted null hypothesis turned out to be an almost-as-spectacular r=0.84! So, just as with Yu et al., the observed result was significant, but only slightly better than CCA produced by chance alone.

In fact, Smith et al. went on to test the validity of the CCA by running CCA for 80% of the dataset ('training set') and testing it in the remaining left-out 20%. This is a kind of rough-and-ready approximation of using a second dataset. Smith et al. found that the correlation in the left-out data was r=0.25 - a much more modest result, although still something.

I would say that this kind of train/test analysis should be a bare minimum in any neuroscience CCA paper. I suspect that if it were applied in Yu et al.'s case the correlation would be small.