With Big Data and Predictive Analytics, Scientists Are Getting Smarter About Outbreaks

Devastating diseases can catch us unaware, but medical experts are developing digital tools to prevent future chaos.

By Mallory LocklearAug 20, 2018 12:00 AM
Outbreak Pandemic Virus Epidemic Art - Discover/Shutterstock
(Credit: Alison Mackey/Discover; Shutterstock elements: bacteria, Rost9; numbers, ArtHead; globe, Kasia; laptop, Peter Kotoff)


Sign up for our email newsletter for the latest science news

It’s 2014 in Liberia. The country’s largest hospital is already full to the brim, unable to admit. Instead, the sick lie on the ground outside, writhing and crying in pain. They’re struck with severe bouts of vomiting and diarrhea, impaired kidney and liver function, perhaps even internal bleeding. Up to 90 percent of those sick will die. Their only hope of getting treatment is if someone else dies first, freeing up a bed.

The culprit behind the devastation? Ebola. By 2016, the outbreak ended with more than 11,000 reported deaths across West Africa. In the aftermath, experts underscored that, during the worst months, few were prepared for such catastrophe — neither the countries that suffered most, nor the international community at large. Though the World Health Organization eventually declared a Public Health Emergency of International Concern, it came late, and arguably, so did vital funding. And while few cases spread outside the continent, the resulting panic certainly did.

But what if there were an early warning system for the outbreak? Something that could have given health organizations a heads-up, allowing them to organize an effective response, contain the disease’s spread, save tens of thousands of lives and prevent an international crisis? Such a system may be in the not-too-distant future.

Non-profit, university and non-government groups across the globe are tackling this idea from various angles. From compiling information on potentially infectious agents to tracking real-time diagnoses in disease hot spots, epidemiologists — those who study the incidence and prevalence of disease — are getting us closer to a world with fewer surprise pandemics.

Pulling It Together

Pathogens hitch rides on hosts, spreading microorganisms such as bacteria and viruses. Researchers have studied many of these pathogens (human immunodeficiency virus, for example). But the results of their work aren’t all stored in the same place — they’re scattered across journals and various databases. If experts sequence a pathogen’s DNA (which helps identify and track it), that data typically gets uploaded to a public database, but it’s not paired with any additional existing descriptions. Instead, interested researchers have to manually cross-reference with various journals.

This representation of EID2 data shows the link between pathogens and their human and domestic animal hosts. Each host is represented by a node; the bigger the node, the more pathogens found in that host. The lines between hosts indicate the number of pathogens that show up in both hosts; the thicker the line, the more pathogens the pair share. Colors simply indicate the type of host: human, rodents, other mammals and birds. (Credit: Maya Wardeh)

“We needed to find a better way of bringing the information together,” says Marie McIntyre, an epidemiologist at the Institute of Infection and Global Health at the University of Liverpool.

So McIntyre and her colleagues created a new database: the Enhanced Infectious Diseases Database, or EID2. It’s programmed to link publicly available information about all known pathogens of a given host, all hosts of a given pathogen and info about when and where that pathogen showed up. “It’s not about where the disease is occurring today,” McIntyre says. “It’s about where the disease is occurring and who the disease is occurring in.” That combination of information can help researchers look for long-term drivers of disease, such as how climate affects the spread of a pathogen.

By pooling information from various public databases, EID2 lets users see in one spot lots of data that’s usually scattered. Above is a look at sources of information on HIV overlaid on areas where the pathogen exists. (Credit: Map, Marie McIntyre; laptop, Kostov/Shutterstock)

Pros: By bringing knowledge together in one place, EID2 makes it easier to investigate, anticipate and prepare for a pathogen. McIntyre says the database, which includes millions of sequences and information on thousands of pathogen species, is also easily updated. Plus, it’s free, and anyone can use it.

Cons: EID2 relies on public information, so it’s limited to already published knowledge. If researchers discover a pathogen but its DNA isn’t sequenced, or if no one else has posted information about it in a public forum, EID2 can’t incorporate it.

Up Next: The EID2 team plans to expand the database, incorporating diseases that affect crops.

A Learning Process

In the world of epidemiology, diseases that have seen an uptick in recent years are called “emerging infectious diseases.” But are there really more cases of these diseases, or have we just become better at spotting them? According to Barbara Han, a disease ecologist at the non-profit Cary Institute of Ecosystem Studies in New York, it’s not just us getting better. “It’s actually an increasing problem of infectious diseases,” she says. And most of these diseases originate in animals.

Han decided to figure out what makes certain animals more likely to host specific diseases. “There is something inherent about a species that enables it to carry disease, compared to the vast majority that don’t,” she says. “I want to know what the data can give me, what can the data show me, about what distinguishes those two.” She turned to algorithms and machine learning.

Han starts with a list of species that researchers have already flagged as disease carriers or non-disease carriers. She then trains a computer algorithm to separate the species on the list — not labeled in any way, so the algorithm doesn’t know which is which — by dozens of traits. For example, the algorithm may start by looking at an animal’s body mass, followed by its age of sexual maturity and finally by whether it’s nocturnal or not. At the end of this sorting, the algorithm will ideally have grouped species by whether they’re disease carriers or not.

But this first sort gets a fair bit wrong. To make the algorithm more accurate, Han has the computer do another round of sorting, this time focusing on the species it miscategorized the first time. When it does this over and over again, the algorithm learns. And, importantly, it learns which factors contribute to a species carrying a transferable disease or not. “At the end of that process, you get a very powerful predictor,” Han says. When the model examines a species that’s a question mark — whether or not it carries disease isn’t known beforehand — it can use what it’s learned to study that species’s traits, compare them with traits from known carriers and predict the likelihood of that species hosting a disease.

The algorithm can also create a list of animals ranked by their risk of carrying disease, as well as a description of the traits that determine that risk. For example, when Han trained the algorithm with hundreds of mice species, it determined disease-carrying risk was associated with a rapid life cycle — early sexual maturity, frequent reproduction and fast growth rates. Knowing what animals and which traits are most likely to be associated with disease allows researchers to zero in on and prepare for where the next pandemic could originate.

An example of how machine learning can help researchers predict where and when outbreaks might occur. (Credit: Alison Mackey/Discover; Shutterstock elements: basel101658, Potapov Alexander, Hein Nouwens, a Sk, Black creator)

Pros: This model is based on objective facts about animals, so predictions are less prone to bias. And the model’s predictions of risk are stable because they’re based on biological traits that aren’t likely to change anytime soon.

Cons: The ability to predict any species’s disease risk relies on how much we know about it. So if we don’t have enough information, the algorithm has little to work with — and that could lead to inaccurate predictions. There’s also the problem of follow-up. “It’s almost like selling an insurance policy,” Han says. Her model can produce a list of potentially risky animals, but if no one investigates them firsthand, the prediction is just a prediction. So in many cases, confirming the model’s output takes some time.

Up Next: Han is working on figuring out how to turn prediction systems like her algorithms, which can be valuable tools for researchers already focused on sniffing out emerging diseases, into something more proactive, such as an early warning system. She’s now focusing on what types of data are necessary for such an alert system and what still needs to be collected.

Location, Location, Location

EcoHealth Alliance, another New York-based non-profit focused on global health, is also interested in how and when diseases jump from animals to humans. Not only is it looking at which species put humans at risk, it also focuses on which regions and animal habitats are more susceptible to sparking pandemics.

“A few years ago, we compiled a database of every known emerging disease to find out what the reality is,” says Peter Daszak, a disease ecologist and the organization’s president. “Around two-thirds of all emerging diseases, maybe even more, are of animal origin.”

Daszak and his team created a mathematical model that uses outbreak data from the last 50 years to predict where outbreaks might occur. With that tool, he and his colleagues found that many of these hot spots of emerging diseases were in tropical areas. Then, EcoHealth team members went out to these areas, testing local residents and wildlife for disease to confirm their model’s accuracy. Those regions host incredibly dense and diverse wildlife, and since each species comes with its own set of pathogens, the more biodiversity you have, the greater the risk of emerging diseases.

“We live in a globalized world where we’re changing the environment so fundamentally that pathogens are changing their behavior,” Daszak says. “They can jump from one species to another more easily because we’re butting up against different species.”

Based on data from past outbreaks, EcoHealth Alliance’s mathematical model flags areas (usually those rich in biodiversity) that are more likely to spawn an emerging disease in the future. The warmer the color, the greater the likelihood. (Credit: EcoHealth Alliance)

Pros: Using these analyses to pinpoint potential outbreak hot spots allows health care organizations and governments to direct resources to that area. Researchers and physicians then can focus on that region and directly test for the emergence of diseases from both wildlife and humans, allowing for a better chance at prevention and containment.

Cons: Relying on a mathematical model requires researchers to make assumptions. For example, the model may show that deforested areas are hot spots for new outbreaks. But it doesn’t explain the complex reasons that make up the whole picture of why this occurs. So while the map is limited in what it can tell researchers, it does point researchers to key places to seek underlying causes.

Up Next: Emerging disease leaders from around the world, including those from EcoHealth Alliance, have come together to form the Global Virome Project. The goal is to identify all currently unknown viruses that could emerge in the future — an estimated 1.6 million. By knowing which viruses pose a threat to humans and which animals carry them, EcoHealth and similar groups will be even better prepared to predict where the next pandemic may spring up. The project is expected to take 10 years and cost up to $5 billion.

There's a Map for That

Doctors Without Borders (also known by the French name Médecins Sans Frontières, or MSF) and the British Red Cross (BRC) are collaborating to tackle the spread of disease in real time.

Their efforts began with the Missing Maps Project, a 2014 initiative carried out by MSF, BRC, the American Red Cross and the U.S.-based non-profit Humanitarian OpenStreetMap Team. The project trained citizen volunteers to digitally trace the buildings and roads that appear in satellite images, creating maps. They focused on regions that are most vulnerable to crises like disease outbreaks and natural disasters, but aren’t typically mapped in detail — which can be a problem for aid workers responding to a disaster.

MSF and BRC applied this technique in Lubumbashi, a city in the Democratic Republic of Congo. They mapped buildings and road networks, as well as details like neighborhood limits, identifying key areas where crisis victims might arrive. These maps provided a basis on which to build an outbreak tracking system: The team created software that would combine the maps with patient details collected by doctors, making it easier to check for patterns or signs of an outbreak.

Doctors and nurses enter patient information, including age, length of stay and admission date, into the software, and an animated map shows where patients are coming from and when. The tool “will show a map of the city and the administration areas, and will show colors in different intensity where the outbreak is occurring the highest,” says Simon Johnson, a BRC technical leader who helped develop the software. “The idea is you can then start preventative exercises, rather than just treatment of patients coming in.”

The British Red Cross and Doctors Without Borders teamed up to build this digital dashboard. The tool combines local maps with patient data, so first responders can track details that could help them spot an outbreak in real time. (Credit: map, Doctors Without Borders; laptop, Peter Kotoff/Shutterstock)

Pros: The technology is open source and can be developed rapidly, allowing new groups to use and customize it.

Cons: The output is only as good as the effort users put into it. If people enter data inaccurately, the mapping will be inaccurate as well. Users must be properly trained.

Up Next: The team is working to bring this dashboard to more locations, according to former project leader Idriss Ait-Bouziad’s presentation of the work at an MSF conference last year.

The High Costs of Fighting Disease

Working to give people a heads-up when diseases break out is useless without resources to deal with the situation. Once experts predict a potential outbreak, who funds the necessary preventive and containment measures? And how much will they give?

Here’s a look at some of the major contributors and how much money they’ve committed to fighting significant disease outbreaks.

(Credit: Alison Mackey/Discover)

This article originally appeared in print as "Outsmarting Outbreaks."

1 free article left
Want More? Get unlimited access for as low as $1.99/month

Already a subscriber?

Register or Log In

1 free articleSubscribe
Discover Magazine Logo
Want more?

Keep reading for as low as $1.99!


Already a subscriber?

Register or Log In

More From Discover
Recommendations From Our Store
Shop Now
Stay Curious
Our List

Sign up for our weekly science updates.

To The Magazine

Save up to 70% off the cover price when you subscribe to Discover magazine.

Copyright © 2023 Kalmbach Media Co.