A pile of early coronavirus data that was missing for a year has emerged from its hiding place.
In June, an American scientist discovered that more than 200 genetic sequences from Covid-19 patient samples isolated in China at the start of the pandemic had been puzzlingly removed from an online database. Jesse Bloom, a virologist at the Fred Hutchinson Cancer Center in Seattle, used digital research to find 13 of the sequences in Google Cloud.
When Dr. Bloom, sharing his experience in a report published online, wrote that “it seems likely that the sequences were deleted to obscure their existence”.
But now a strange explanation has emerged from an editorial oversight of a scientific journal. And the sequences have been uploaded to another database monitored by the Chinese government.
The story began in early 2020 when researchers from Wuhan University explored a new way to test for the deadly coronavirus that is sweeping the country. They sequenced a short section of genetic material from virus samples from 34 patients in a Wuhan hospital.
The researchers published their results online in March 2020. That month they also uploaded the sequences to an online database called the Sequence Read Archive, maintained by the National Institutes of Health, and submitted a publication of their results to a scientific journal called. a small one. The paper was published in June 2020.
Dr. Bloom became aware of the Wuhan sequences this spring while researching the origin of Covid-19. While reading a May 2020 report on coronavirus early genetic sequences, he came across a table that noted their presence in the Sequence Read Archive.
But dr. Bloom couldn’t find it in the database. On June 6th, he emailed the Chinese scientists to ask where the data was going, but received no response. On June 22nd, he published his report, which was covered by the New York Times and other media outlets.
At the time, a spokeswoman for the NIH said the study’s authors requested in June 2020 that the sequences be removed from the database. The authors informed the agency that the sequences would be updated and included in a different database. (The authors did not respond to inquiries from The Times.)
But a year later, Dr. Bloom couldn’t find the sequences in any database.
On July 5, more than a year after the researchers removed the sequences from the Sequence Read Archive and two weeks after Dr. Bloom’s report was published online, the sequences were quietly uploaded to a database of the China National Center for Bioinformation by Ben Hu. a researcher at Wuhan University and co-author of the small paper.
On July 21, the disappearance of the sequences was raised during a press conference in Beijing at which Chinese officials denied claims that the pandemic began as a laboratory leak.
According to a translation of the press conference by a journalist from the state-controlled Xinhua News Agency, Vice Minister of China’s National Health Commission Dr. Zeng Yixin that the problems arose when the editors of Small deleted a paragraph in which the scientists described the sequences in the Sequence Read Archive.
“Therefore, the researchers thought that it was no longer necessary to save the data in the NCBI database,” said Dr. Zeng, referring to the Sequence Read Archive published by the NIH. is operated
An editor at Small, who specializes in micro and nano science and is based in Germany, confirmed his presentation. “The data availability declaration was erroneously deleted,” wrote editor Plamena Dogandzhiyski in an email. “We will shortly issue a fix that will clear up the error and contain a link to the depot where the data is now hosted.”
The Journal published a formal correction to this effect on Thursday.
It is not clear why the authors did not mention the journal’s error when they requested to remove the sequences from the Sequence Read Archive, or why they notified the NIH that the sequences would be updated. It’s also not clear why they waited a year to upload it to another database. Dr. Hu did not respond to an email asking for comment.
Dr. Bloom was also unable to provide an explanation for the conflicting accounts. “I am unable to judge between them,” he said in an interview.
These sequences alone cannot solve the open questions about the origin of the pandemic, be it through contact with a wild animal, a leak from a laboratory or otherwise.
In their first reports, the Wuhan researchers wrote that they extracted genetic material from “samples from outpatients suspected of having Covid-19” at the beginning of the epidemic. But the entries in the Chinese database now suggest they were taken from the Renmin Hospital at Wuhan University on Jan. 30 – almost two months after the earliest reports of Covid-19 in China.
While the disappearance of the sequences appears to be the result of an editorial error, it was Dr. Bloom still worth looking for other coronavirus sequences that might be lurking online. “That definitely means we should keep looking,” he said.