Here’s a thought experiment for you: If someone told you you had to drink just one kind of alcoholic beverage for the rest of your life, and you wanted that life to be long and healthy, what would you pick? Wine, right? After all, you’ve probably heard about the scientific studies showing that drinking wine is associated with better health in general, and a longer life span in particular. Give jocks their beer and lushes their hard liquor; the drink of robust, long-lived people is wine.
But you have probably not heard about another study, released during the media dead zone just after Christmas last year, that questioned wine’s reputed health effects. Researchers at Stanford University and the University of Texas at Austin examined a group of Americans aged 55 to 65 and compared their drinking habits with how they fared over the course of 20 years. The scientists found that moderate drinkers lived longer than abstainers, and that wine drinkers did indeed live longer on average than people who consumed other kinds of alcohol. But they also found that wine drinkers were less likely to smoke, to be male, and to be sedentary; all of these are factors associated with dying earlier.
The Stanford-Texas team concluded that drinking wine might be an indicator of a healthy lifestyle rather than the cause of that good health. If so, wine is the drink of the healthy, all right—the already healthy.
That finding highlights what is arguably science’s greatest enemy, the confounder. Science is at heart a reductionist process: Take a complicated system, identify various factors that affect the system, and measure the effect of each factor one at a time. Confounders are devilish hidden connections that make it more difficult to isolate the factors you want to measure, like the fact that wine drinkers tend also to be nonsmokers.
Researchers are continually trying to root out confounders and account for them in their data. Their most powerful tool in this job is the randomized controlled trial, a type of experiment in which researchers separate participants into two or more groups and subject some of them to the intervention to be studied, like a new drug or surgical procedure.
New medical interventions must be proven safe and effective in a randomized controlled trial before the Food and Drug Administration (FDA) will approve their use. Though seen as the gold standard of medical research, such studies—even ones involving thousands of participants—may be too small to ferret out rare risk factors or side effects. And when it comes to claims about food, randomized trials may never be conducted at all. Few Bud or bourbon drinkers will switch to burgundy for 20 years just for the sake of research and a bit of cash.
In 2009 epidemiologists at Harvard Medical School developed a way for scientists to account for the invisible connections that were confounding their studies. The new approach uses an algorithm that automatically identifies and adjusts for confounders as well as or better than the most knowledgeable scientist can, says Jeremy Rassen, one of the algorithm’s creators.
Wine may be an indicator of a healthy lifestyle rather than a cause of good health. If so, it is the drink of the already healthy.
Called the high-dimensional propensity score algorithm (hd-PS), it is a tool for improving not randomized clinical trials but broader observational studies, in which researchers watch a large pool of participants and look for correlations—like the fact that wine drinkers live longer than other drinkers. Observational studies are cheaper and easier than clinical trials. Unfortunately, the data they yield are rife with confounder problems, but researchers can improve the data by adjusting for suspected confounders and removing the bias they introduce. In the recent observational study on wine and longevity, for instance, after the researchers accounted for smoking, gender, and activity level, they found that beer and hard liquor were just as life-extending as wine.
And this is where hd-PS shines. While a shrewd researcher with decades of experience might adjust for a few dozen confounders, Rassen’s algorithm can easily identify 500 of them. To use hd-PS, a researcher downloads the program from the Harvard site, connects it to one of the data software packages widely used in epidemiology, and imports to the system a wide range of health information on each study subject, ranging from basics like blood pressure and age to smaller, more esoteric factors like whether the individual saw a doctor in the past six months.
Then hd-PS ingests all this information—“it’s a data-hungry algorithm,” Rassen says—and churns away on some heavy number-crunching. At the heart of the crunching is a process called propensity score matching. The algorithm sorts through all the variables in the data and isolates ones that seem to be risk factors for a particular health problem. It combines those risk factors into one summary score, and compares two groups that have identical summary scores but also one key difference. Beer-drinkers and wine-drinkers with the same summary scores, for instance, differ in their preferred drink but are otherwise at exactly the same risk level. The computer by itself accomplishes the key task of isolating each variable and measuring its effect.
In a paper published last January, Rassen and Sebastian Schneeweiss, another Harvard epidemiologist and cocreator of the algorithm, put hd-PS to a test to see if it could analyze complicated health data as well as human experts. They ran the findings from some previously published observational studies through hd-PS and confirmed that it drew conclusions similar to the ones done the old-fashioned way, by expert scientists picking confounders one by one. In one run, they plugged in a heap of data comparing the safety of COX-2 inhibitors (a popular class of pain relievers, such as Celebrex) and nonselective nonsteroidal anti-inflammatory drugs, or ns-NSAIDS (like ibuprofen and naproxen), among an older group of patients in Pennsylvania.
The raw data showed that taking COX-2 inhibitors correlated with a 9 percent higher risk of gastrointestinal bleeds, a potentially life-threatening side effect—a surprise, since a raft of clinical studies had previously shown that those drugs were less likely to cause gi bleeds. In fact, that was one of the main reasons COX-2 inhibitors were created. The huge confounder here is that doctors prescribe those drugs especially to people at higher risk of gastrointestinal bleeds. When the raw, misleading data were plugged into hd-PS, the algorithm quickly teased out confounders and computed that COX-2 inhibitors were associated with a 13 percent lower chance of GI bleeds compared with ns-NSAIDS, a figure close to the results seen in a randomized trial.
The performance of hd-PS “is really very encouraging,” Rassen says. “The algorithm is finding more things than the investigators themselves, which is what we’d expect.”
Many scientists are understandably skeptical about letting a computer program take over the analysis of their precious data. “The first reaction is: ‘This can’t possibly work. We have 50 years of history about how studies are done, and this is not how they are done,’ ” Rassen says. But when researchers further consider hd-PS, they usually come around to recognizing its value—even if it wounds a few egos by finding a wider range of confounding variables than they would on their own.
The algorithm could prove particularly useful when researchers have enormous datasets and almost unending questions about what kinds of findings could be lurking therein. That is the challenge inherent in the Sentinel Initiative, being developed by the FDA to monitor pharmaceuticals after they reach the market. To get through the FDA’s demanding approval process, drug companies must demonstrate their product’s safety and efficacy in costly randomized clinical trials. But then the drugs go through a second, uncontrolled kind of trial when the public starts using them.
Confounders are hidden connections that make it harder to tell whether a drug really works or if the results of a study are correct.
Sentinel will treat the consumer market like one giant observational study to make sure the drugs are as safe and effective as believed. Studies of drugs in the wild are in some ways better than the clinical trials. For one thing, there are typically far more subjects taking the drugs, so post-approval studies can turn up problems that were too rare to show up before. Moreover, trials are often designed as best-case scenarios to show a drug’s potential; in patients’ hands, the drug may not work as well.
In the uncontrolled real world, riddled with invisible connections, the Harvard researchers’ algorithm could be the best tool for making sense of complicated data. Rassen has analyzed the information available from the public use of Vioxx, a COX-2 inhibitor that was pulled off the market in 2004 because it increased the risk of heart attack and stroke. He says his algorithm would “definitely” have detected problems early on.
Randomized clinical trials will undoubtedly remain the primary mechanism for how drugs are approved; they are just too powerful, and too entrenched. But hd-PS should serve up invaluable additional information about how to use drugs once they are on the market—and could save the lives of people taking drugs that turn out to have lethal side effects. Richard Platt of Harvard Medical School, leader of the ongoing Mini-Sentinel pilot program (a testing ground and warm-up for the larger Sentinel Initiative), hopes that hd-PS will accurately sniff out health concerns, or else put them to rest. “We plan to test it within the next few months,” he says.
If hd-PS does prove useful for post-approval drug review, will algorithms eventually take over researchers’ jobs? Will the scientists of the future be computers? Rassen is convinced that much of the investigator’s role is safe. “We think a lot about the design of research projects. I doubt that’s something a computer could ever do,” he says. Deciding what kind of experiment to perform and how to set it up is too particular to each situation to give it over to an algorithm.
Until computers can handle real-world complexity as well as the exquisite human mind, scientists will still be in demand. But they should probably get used to the idea that computers may already be better than people at finding needles of truth in enormous haystacks of data.