Scientists tried to replicate a provocative gene editing paper in real-time, and documented it on Twitter

A study linking an edited CCR5 gene with dying young didn't pass the smell test

Alison Koontz



Last year, scientist Jiankui He shocked the world when he used the revolutionary gene editing tool CRISPR-Cas9 in two human babies in China during their neonatal development. The system made deletions in the gene CCR5, a gene that is important for helping HIV infect the immune system. The hope was the edits would make the babies HIV immune.

The move rocked the scientific community, sparking debate about the bioethical implications of human genome editing and the potential unknowns of using the technology to create immunity against diseases.

In June, an article published in Nature Medicine revealed that although editing the CCR5 gene can lead to immunity against HIV, it also increases the mortality rate by 21% in individuals with two copies of the edited gene. The findings revealed that one of the twins edited by Jiankui He has a higher chance of dying young than her sibling, raising serious questions about the long-term health effects that could result from human genome editing.

Each human chromosome, a super-packed carrier of our genetic information, holds two copies of each gene in the human genome, one passed down from your mother and one from your father. There is a chance that your mother and father have different copies of the CCR5 gene, each with slight changes in genetic information. This is referred to as heterozygosity. Or your parents might have the same genetic information for this gene, which means that you have two copies of the same genetic information, called homozygosity.

Of the two babies who were edited with the CRISPR system, one of the babies was homozygous for the deletion in the CCR5 gene, and one was heterozygous. The findings of the Nature Medicine study, led by biologist Rasmus Nielsen at the University of California – Berkeley, suggested that the baby with the homozygous deletion in CCR5 has a potentially higher risk of mortality than her sibling due to the intentional editing of her genome. This finding confirmed the worries of many in the scientific community about genome editing, and rocketed the paper to international headlines.

Yet despite the new research's widespread publicity, some scientists had doubts about the results. Within 24 hours of the publication of the paper, and the resulting hailstorm of international headlines, scientists around the world were already attempting to replicate the study's analyses to come to their own conclusions - and documenting their processes in near-real time on Twitter.

The researchers encountered a number of roadblocks that prevented them from being able to replicate the results. This lack of reproducibility led many to question the accuracy of the new findings, raising concerns that they were possibly false or over-inflated. And because the new study was making international headlines, some worried that the public was being seriously misinformed by unverifiable science.

Statistician Sean Harrison of the Institute for Epidemiology at the University of Bristol, who had access to the UK Biobank dataset used in the original paper, practiced “live science” by publicly attempting to replicate the findings of this paper on Twitter. The long thread, which included all the assumptions of the analysis that he could glean from the original publication, including the computer code he wrote and data manipulations he performed (which were not reported for the original paper), yielded markedly less significant results than those published. “Basic things we do to genetic data weren’t done,” Harrison commented. “There’s massive gaps in the methods.”

Nielsen's group calculated the survival rate for individuals with all variants of CCR5 deletion, both homozygous and heterozygous from over 400,000 individuals who had their genetic makeup sequenced and published on the UK Biobank. This data is not publicly available, but can be accessed with consent from the institution. This data block was the first roadblock in replication. Without a tangible dataset, there was no way for fellow researchers to complete their own analyses.

Those few researchers, including Sean Harrison, who did have access to the dataset ran into another issue: the lack of reporting for various components of the scientists' statistical approach.

Multiple analyses were used to calculate the study's major result, the 21% increase in mortality for individuals with homozygous mutations. However, the exact parameters that Nielsen and his team used to calculate this value were not clearly stated within the published manuscript or available supplemental material, making them nearly impossible to replicate.

Cecile Janssens, an epidemiologist at Emory University also publicly attempted to replicate the reports from the paper, but stopped far earlier than Harrison. “The paper is too confusing,” she wrote on Twitter, “with essential data unreported.”

With so many doubts cast upon this highly-publicized paper, it’s no wonder that Rasmus Nielsen took to Twitter to publicly refute the nay-sayers. He then performed a follow-up analysis using Harrison’s methods for adjusting samples for genetic relatedness and some of the suggestions made in the analytic setup. This analysis yielded the same results as the original paper. “If you, or anybody else, have questions about the paper or the analyses, please feel free to contact us”, Nielsen tweeted. “We will be happy to help, and we apologize if some of the Methods section is difficult to parse.”

There are still a few problems with the paper, the most obvious being the reporting for the UK Biobank itself. The database relies on self-reporting from all of its subjects on a volunteer basis, which can lead to a skew in the analysis by the “healthy volunteer effect.” This happens because healthy people preferentially volunteer their data compared to non-healthy ones, potentially resulting in a lower mortality rate.

In this case, the death rate of the UK Biobank sample was 46-58% lower than national average, meaning there is a possibility that the change in mortality rate should be bigger with people with the mutation. Additionally, the healthy volunteer effect could have resulted in members for which the CCR5 mutation could be advantageous (populations with higher rates of HIV) not being included in the Biobank sample, which would also skew the calculated death rate.

In the end, the main takeaway of Nielsen's paper for the scientific community wasn’t just the finding that genome editing of CCR5 can lead to a higher mortality rate in homozygous persons. It also underscored a steadily growing problem: that reproducibility of science is steadily declining, which can lead to wrong information being spread not only within the scientific community, but to the public as well.

A 2016 poll reported in Nature showed that over 1000 scientists out of 1500 had tried and failed to reproduce at least one other scientist’s experiment, and this problem is not getting better. In a follow-up to his Twitter thread, Sean Harrison further commented on why he thinks his public dismantling of the CCR5 science was important.

“This paper has received an incredible amount of attention,” he writes. “If it is flawed, then poor science is being heavily promoted.”

Comment Peer Commentary

We ask other scientists from our Consortium to respond to articles with commentary from their expert perspective.

Brooke N Dulka

Behavioral Neuroscience

University of Tennessee

Alison Koontz hits the nail on the head - reproducibility of science is steadily declining. To me, that’s what this article’s really about. Ethical issues of the CRISPR-Cas9 system aside, we are facing a bigger problem in science, and that is the  replication crisis. However, the academic and publishing systems aren’t making this problem any easier. Few journals want to publish replications - whether they are positive or negative replications - and funding for replication research is lacking. This has led to to little incentive to double check, not only other scientist’s work, but even our own results. This is a problem we, as scientists, must rectify in order to gain public trust.

Overall, great piece. Thanks, Alison!

Gaius J Augustus responds:

I just want to second this extremely important issue in science right now.

We are very concerned about the ethics of this kind of research, but we also need to be concerned about the ethics of publishing data that is not reproducible. Our reproducibility crisis can be connected directly to several public health and public trust issues.

The problem is two-fold: (1) masking and misleading methods and (2) lack of internal and external reproducibility studies.

Within the scientific community, we are too often met with an inability to reproduce others’ work because the methods are not adequately explained. Methods should be complete and accurate, reflecting not only the steps that were taken, but optimizations that were necessary to get their outcome. Studies should have adequate  internal replication, but external validation of results has shown itself to be incredibly important.

The secrecy that surrounded Jiankui He’s work stalled scientific evaluation of the results as well as limits the followup studies possible to look at effectiveness, safety, and feasibility of using gene editing as a preventative measure to combat disease.

Scientists must take a stand with public funding agencies to promote better open data policies and reproducibility studies, and we must hold each other accountable to provide accurate methods and robust results.

Claudia Lopez-Lloreda


University of Pennsylvania

Love that this article touches upon so many problems in science today. I agree with Brooke that journals are certainly not making the publication of replication of studies easier, but I do see a change towards being much more transparent with the data and how it was acquired. Certain journals now ask for the raw data at the time of submission, along with other  in-depth information about experimental design and analysis. Even then, raw data could be interpreted differently and journals are starting to take steps to address this. For example, for some papers the British Journal of Anaesthesia asks experts not associated with the study to provide their own conclusions just based on the methods and results and publishes them along with the conclusions from the original authors. In one instance, the conclusions were completely different. Since publishing systems are such an integral part of science, it is up to these institutions to continue to implement significant changes to address the reproducibility  problem.

Also, love how twitter has given scientists a platform to have these discussions. Twitter ftw [for the win]!

Great article, Allison! Thanks for highlighting such a relevant issue to our times.The piece actually brings to mind an older replication study, but in this case two labs worked together to  reconcile some disparate findings. The labs, one on the US West Coast and the other on the East Coast, famously tried to figure out why they were seeing different results even though all the source material and  protocols were seemingly identical. It turned out to be a simple difference in a routine process, deemed unworthy of a special mention in the methods.

Hopefully such efforts can serve as examples to scientists, both in terms of willingness to cooperate and in ensuring clarity when communicating research.