Massive Logo Massive

When scientists become 'data parasites,' everybody wins

Massive Logo Massive

When scientists become 'data parasites,' everybody wins

Why some scientists are celebrating colleagues who "steal" data

If the world of research science is an ecosystem, data—the output of every experiment and study—are its most crucial resource. This can be the raw numbers measured during an experiment, survey results, or many other kinds of facts and statistics. Whatever form it takes, scientists have been accused of being drawn to data like mosquitoes, feeding upon any source they can get their hands on.

Like many natural resources, the distribution of data in the ecosystem of science is unequal. Some scientists have an excess of data; some scientists don’t have enough. This inequality has led scientists to develop a similar phenomenon to something that occurs all the time in natural ecosystems: data symbiosis.

Symbiosis is any long-term interaction where two species share resources, and scientists see it everywhere in nature. In the science ecosystem, those shared resources are data. As in nature, we'd like to believe that no available resource in science should go to waste. Unfortunately, it's still unusual for scientists to use data from experiments they didn't contribute to.

Historically, it was inconvenient and difficult to share raw data. Reams of paper would literally have to be mailed around the world. Today, the internet makes sharing data much easier, but there continues to be a stigma against using data you didn't collect. Researchers who produce the data often feel like they are being taken advantage of.

“There’s always been a stigma on people who mainly do secondary data analysis,” said Robin Dowell, Assistant Professor and self-described research parasite at the University of Colorado in Boulder. “There’s a sense that they’re stealing ideas, but they’re not really stealing ideas—they’re stealing primary data.”

But secondary analysis can be just as important as original research. Showing that a result is reproducible when the same data is handled by a different research group makes that result much more believable. Similarly, combining and analyzing data collected by multiple groups can tell us if there are universal trends that reach beyond any individual experiment.

Increasingly, labs are sharing data online, which allows other scientists use these collectivized data sources to support their own needs. These scientists are becoming known in some circles in the scientific community as “research parasites,” a term that was coined by an editorial in the New England Journal of Medicine.

As the term may suggest, these scientists are sometimes discredited because they did not do any of the work involved in producing or paying for the original studies. To combat that stigma, there are people who are celebrating these scientists for using publicly available data in new ways. Casey Greene, a computational biologist at the University of Pennsylvania, started the Research Parasite Awards in 2016. He gives these awards to people who conduct rigorous secondary analysis on data shared by other researchers.

For example, one of the award winners in 2016 was Dr. Erick Turner, from the Oregon Health & Science University. Turner studied the published results of FDA-registered clinical trials for antidepressants, and discovered a significant publication bias. When the drugs in question appeared to work, the study was much more likely to be published. This potentially presents an unrealistic view of drug efficacy and potential for success in patients.

Although the name of the award is tongue-in-cheek, Greene thinks it’s an important move towards dispelling the stigma against secondary analysis.

“The awards are a good way of changing people’s minds,” said Greene. The phrase “research parasite” was “a really valuable thing to reclaim, and convert to a positive.”

However, Greene soon started to think about the other half of the symbiotic relationship between research parasites and “hosts”—people sharing data.

“Shortly after we started it, I started realizing that this was only attacking half of the issue,” said Greene. “Yes, we’re honoring people for reanalyzing data, but we’re not actually honoring the people who are sharing the data in the first place. This didn’t sit right with me.”

“There’s a need to recognize the people who generate the data as well as the people who use the data,” said Brian Byrd, Assistant Professor at the University of Michigan.

Greene and Byrd teamed up to create a second award: the Research Symbiont Award. This award recognizes the researchers who share data in a way that is easily accessible and understood. The name comes from the idea that the relationship between researchers who share data and those who reanalyze it are in a symbiotic relationship. In fact, Greene argues that the relationship is mutualistic.

“I like that idea and terminology better than host and parasite,” said Greene. “I actually don’t think that most of this reanalysis hurts the original investigators.”

In fact, both Greene and Byrd suggest that the Research Symbiont award could replace the Research Parasite award. Although Greene is proud to have reclaimed the term “Research Parasite,” he thinks that the phrase still carries some negative connotations and may prevent researchers from wanting to share data. In contrast, the term “symbiont” is both more positive and more inclusive. In theory, the award may include categories for both hosts (data-sharers) and parasites (data-sharees).

“I really like the idea of moving towards more positive framing of the award,” said Byrd. “I think a change in name could help that.

Whatever the name, the Research Parasite and Research Symbiont Awards signify a new and occasionally controversial approach to the way that the scientific ecosystem thinks about resource sharing.

“I think we’re moving toward a world in which the sharing of data will help many people,” said Byrd. “Including those people who share the data.”

Featured Article

  • Longo, D. L., & Drazen, J. M. (2016, January 21). Data Sharing. New England Journal of Medicine. New England Journal of Medicine (NEJM/MMS). https://doi.org/10.1056/nejme1516564