The Human Genome Project produced the most complete map of human genetics ever assembled in 2003 - but that map still held many uncharted territories.
It did not contain about 8% of the human genome, representing crucial regions and large gaps that have remained hidden from scientists.
Now, an ambitious team of researchers has gone back and filled those empty spaces, assembling the first fully complete and gapless sequence of a human genome.
The Telomere-to-Telomere (T2T) Consortium, a team of around 100 scientists across the United States, announced Thursday that it has made publicly available a truly comprehensive set of genetic instructions for the human body.
"Talk about perfectionists. These scientists saw this amazingly important puzzle was missing a few pieces and decided to take all the technical advancements of the last two decades - with a dash of creativity and hardcore computer science, and even a bunch of intellectual sweat - to complete the picture," Dr. Eric Green, director of the National Human Genome Research Institute (NHGRI), said in a media briefing announcing the achievement.
"This complete sequence now forms an unbroken thread that not only connects to the past work of the Human Genome Project, but also points to future possibilities," Green said.
The new reference genome - called T2T-CHM13 - is expected to serve as a Rosetta stone for human genetics, helping people better understand the ways that genetics drive health, development and evolution.
"If you imagine a world map, 8% is about the size of Africa. An entire continent, if you will, was missing," said Michael Schatz, a T2T Consortium member and professor of computer science and biology at Johns Hopkins University, in Baltimore.
These previously uncharted territories of the human genome contain messy sections in which the same DNA letters repeat over and over again. Because the regions looked like gibberish, scientists largely dismissed them as junk.
Not so, said Evan Eichler, a professor of genome sciences at the University of Washington, in Seattle, who served with both the T2T Consortium and the original Human Genome Project.
"It turns out these genes are incredibly important for adaptation," Eichler said in the Thursday briefing. "They contain immune response genes that help us to adapt and survive infections and plagues and viruses. They contain genes that are important in terms of helping us detoxify agents and are very important in terms of predicting drug response."
Most interestingly, he said, they carry genes that make us uniquely human.
"About half of the genes that are thought to make our bigger brain compared to the other apes come specifically from these regions, which were absent in the original Human Genome Project," Eichler said.
The missing genome sections provide clues for why cancers develop, because they are related to parts of the chromosome involved in cellular integrity and cell division, said Karen Miga, associate director of the UCSC Genomics Institute at the University of California, Santa Cruz.
They also will help researchers better understand disorders like Down syndrome and muscular dystrophy, and even common problems of aging like hearing loss and flagging immune systems, Schatz said.
The T2T Consortium's effort to assemble a complete human genome was made possible by advances in genetic sequencing that were not available at the time of the Human Genome Project, an effort that took 13 years and $3 billion to complete.
By comparison, the grassroots T2T effort cost a few million dollars between the various partners and took around three years to complete, said Adam Phillippy, head of the Genome Informatics Section of the National Human Genome Research Institute. NHGRI was the primary funder of this study.
"We got so many things worked out along the way that if we had to repeat this now, it would cost maybe a few tens of thousands of dollars," Phillippy said. "Hopefully, in another 10 years it will be under a thousand dollars - a big change exponentially in cost."
The combination of full genome sequencing at low cost busts wide open the door to genetically driven medicine, which is now open just a crack, Green said.
"We believe someday physicians will use genome sequences to tailor the medical care of their patients," he said. "This achievement is a first step towards having complete blueprint views of patients, as opposed to only 92%."
Your doctor could have a full copy of your personal genome and use it to treat and prevent illnesses for which you are specifically at risk, Green said.
The T2T Consortium presented its findings March 31 in six papers published in the journal Science. The genome it produced is also accessible online.
A human genome contains more than 6 billion individual letters of DNA organized and packaged within 23 pairs of chromosomes.
"A genome is the complete book of instructions for any species," Eichler said. "Every species has their own genome. It's the complete set of genetic blueprints that basically tells cells when and how to actually create an individual species."
What researchers had in 2004 was basically a book with entire chapters still missing, he said.
"To think about how you go from an individual single cell to a complete organism, you need that complete book of instructions," Eichler said. "Over the years, we've been adding bits and pieces to that book, filling in a page here and maybe a couple of pages over there, unscrambling some text that maybe a copy editor didn't get quite right."
And now?
"This time we were [able] to continuously read the book with almost no error, so we can get from page one to the final chapter of the book, and all those important pieces that were missing are now there," Eichler concluded.
T2T-CHM13 will complement the standard human reference genome, produced by the Human Genome Project. It's known as Genome Reference Consortium build 38 (GRCh38) and has been continually updated since release of its first draft in 2000.
Members of the T2T Consortium emphasized Thursday that this is just a single human genome. While it greatly expands our knowledge of genetics, it falls far short of capturing all of the diversity of humankind.
The next step involves the Human Pangenome Reference Consortium, which aims to develop complete genome sequences of 350 different and diverse people.
"The key is building out that collection to make it more inclusive, more diverse, more representative of global diversity. What we would really like to do from a technology perspective is enable all the genomes from this point forward to be done to the same level of accuracy and completion we achieved here," Phillippy said.
"The first one is always the hardest, but it really opens the door for the ones that follow," he added.
More information
The National Human Genome Research Institute has more about genomics.
SOURCES: Michael Schatz, PhD, professor, computer science and biology, Johns Hopkins University, Baltimore; Evan Eichler, PhD, professor, genome sciences, University of Washington, Seattle; Karen Miga, PhD, associate director, UCSC Genomics Institute, University of California, Santa Cruz; Adam Phillippy, PhD, head, Genome Informatics Section, National Human Genome Research Institute, Bethesda, Md.; Eric Green, MD, PhD, director, National Human Genome Research Institute, Bethesda, Md.; Science, March 31, 2022