Writing in STAT last week, Julie Rehmeyer discussed the release of raw data from the PACE study, a clinical trial which has long been controversial amongst the very population it studied: people with chronic fatigue syndrome/myalgic encephalomyelitis (CFS/ME).
Rehmeyer, a CFS/ME patient herself, reports:
Under court order, the [PACE] study’s authors for the first time released their raw data earlier this month. Patients and independent scientists collaborated to analyze it and posted their findings Wednesday on Virology Blog, a site hosted by Columbia microbiology professor Vincent Racaniello. The analysis shows that if you’re already getting standard medical care, your chances of being helped by the treatments are, at best, 10 percent. And your chances of recovery? Nearly nil.
The new findings are the result of a five-year battle that chronic fatigue syndrome patients — me among them — have waged to review the actual data underlying that $8 million study.
Earlier this month a British tribunal ruled that London's Queen Mary University (QMU) should comply with a 2014 Freedom of Information Act request and share the (anonymized) raw data from the PACE study. The PACE researchers and the university had long resisted this move, but following the ruling, QMU admitted defeat. The data is now available here.
There has been an enormous amount written about PACE. Here's my take: in my view, releasing the data was the right thing to do and should have been done all along. But what does the data show? How well does it support what the PACE authors claimed? Is the study "bad science" as Rehmeyer puts it?
First off, I should say that in my analysis of the data I didn't find any 'red flag' evidence of data manipulation, such as duplicated participants. I found eight examples of possible typos (non-integer responses on integer scales), this being in a dataset with about 5000 such datapoints.
PACE was a study of over 600 CFS/ME patients randomized to one of four treatments: cognitive-behavioural therapy (CBT), graded exercise therapy (GET), adaptive pacing therapy (APT) and a control condition, standard medical care (SMC).
In the original 2011 Lancet paper reporting the results of PACE, the authors concluded that CBT and GET "moderately improve outcomes" over and above SMC, while APT does not.
In my analysis of the data I replicated the superiority of CBT and GET. For instance, comparing the pre-post change in scores on the SF36 rating scale and on the Chalder Fatigue Scale (Likert scored), both CBT and GET showed more improvement than the SMC group. These differences are clearly statistically significant (p < 0.001 to p = 0.008) and the effect sizes (d = 0.31 - 0.46) would all conventionally be described as between "small" and "medium" - relative to the statistical variability in the data.
How large is a small effect?
How big were the effects of CBT and GET in absolute terms? Let's look at the Chalder Fatigue Scale (Likert scored) symptom scale. The average baseline score in the PACE patients was 28.2. However, the healthy population mean score on this scale is 14.2, so the patients were suffering from some 14.0 points of 'disease specific' symptoms over the norm.
One year later at the end of the trial, the CBT and GET groups had improved by a mean of 7.5 points, while the control group improved by 4.5 points. So the effect of treatment over placebo the control was 3.0 points, or 21% of the baseline disease-specific symptoms.
I think calling these treatment effects "moderate" is defensible. 21% of the symptoms is certainly not a large fraction, but nor is it a trivial one. I'd call it small-to-medium.
There's a caveat, though. The Chalder Fatigue Scale and most of the other PACE outcome measures were subjective, self-report scales. As I've said before, these have limitations; in this case they might well be susceptible to placebo effects. As Rehmeyer nicely puts it:
I imagined myself as a participant: I come in and I’m asked to rate my symptoms. Then, I’m repeatedly told over a year of treatment that I need to pay less attention to my symptoms. Then I’m asked to rate my symptoms again. Mightn’t I say they’re a bit better — even if I still feel terrible — in order to do what I’m told, please my therapist, and convince myself I haven’t wasted a year’s effort?
There was one more-or-less "objective" outcome measure in the released PACE dataset, namely 'meters walked' (in 6 minutes). Concerningly, CBT was no better than the control group on this outcome (p=0.807). GET did produce benefits but this is perhaps unsurprising because walking was one of the main exercises that formed part of that treatment, so whether GET had any 'generalized' effects over placebo is also uncertain.
Did anyone recover?
There's another important issue: recovery. So far I've talked about at the degree of symptom improvement shown by patients in the trial. But what good is some improvement if you still have lots of symptoms left? PACE, like many trials, sought to examine the number of patients who not only improved, but 'recovered', by the end of the trial. To study recovery we need some criteria: how do we define a patient as 'recovered'?
In their original protocol, published in 2007 before after the trial began recruiting, the PACE authors defined their recovery criteria. However, the researchers later modified the criteria, and the changes are neatly summarized in this analysis by Alem Matthees et al. (Matthees was the one who sent the Freedom of Information Act request.)
Changing a protocol is not a bad thing per se. If the change is transparent and it really is an improvement, who could object? But in this case it's hard to see the benefit. Essentially, the new criteria were looser, meaning that they deemed a higher proportion of patients to be 'recovered' than the originals - perhaps making the treatments in the PACE trial seem more impressive.
The revised criteria were used in a 2013 PACE paper which concluded that over 20% of CBT and GET patients recovered from CFS/ME. However PACE critics have long suspected that according to the original criteria, very few patients recovered in any group. And indeed, the Matthees et al. analysis of the data confirms this: original criteria recovery rates were about 5% overall (with no statistically significant group differences in recovery.)
In my view the critics are right: the revised criteria are almost certainly too broad. For one thing, I noticed that some of the patients in the dataset already met many of the 'recovery' criteria at the start of the trial, which is clearly problematic. I don't think the revised criteria match with the everyday meaning of the word 'recovery' i.e. the absence of all or virtually all symptoms.
That said, we should remember that defining 'recovery' is like drawing a line in the sand; any set of criteria is arbitrary. There's no reason to think that the original PACE criteria were perfect - they may have been too stringent.
To conclude, I don't think that PACE study is "bad science". As a study it seems solid: it had a large sample size, it was properly randomized, etc. The main flaw was the reliance on self-report outcome measures, although PACE is far from unique in that regard. The recovery criteria change was dubious, but this doesn't alter the conclusions of the main study: CBT and GET produced small-to-medium benefits (albeit perhaps placebo ones) in symptoms.
In Part 2 of this post I'll examine the question of whether any of the PACE therapies, especially GET, produced harm.