A new study has generated the most comprehensive analysis of worldwide human genetic diversity to date, through the sequencing of 929 human genomes from diverse human populations. The study, by scientists at the Wellcome Sanger Institute and the University of Cambridge, working with international collaborators, has highlighted large amounts of previously undescribed genetic variation, providing new insights into human evolutionary past. The results have highlighted the complexity of the process through which our ancestors diversified, migrated, and mixed throughout the world.
The resulting resource is being made freely available to all researchers to study human genetic diversity, including studies of genetic susceptibility to disease in different parts of the world. “Though this resource is just the beginning of many avenues of research, already we can glimpse several tantalizing insights into human history,” commented Chris Tyler-Smith, PhD, who has recently retired from the Wellcome Sanger Institute, and is co-author of the team’s published paper in Science. “It will be particularly important for better understanding human evolution in Africa, as well as facilitating medical research for the full diversity of human ancestries.” The scientists report on their findings in a paper titled, “Insights into human genetic variation and population history from 929 diverse genomes.”
Genome sequences from diverse population groups can help to reveal the structure of human genetic variation and also uncover some of the history of, and relationships between, different populations, the authors noted. “They also provide a framework for the design and interpretation of medical genetics studies.” The current consensus view of human history is that the ancestors of present-day humans diverged from the ancestors of extinct Neanderthal and Denisovan groups around 500,000–700,000 years ago, before the emergence of “modern” humans in Africa during the last few hundred thousand years. Then, around 50,000–70,000 years ago, some humans expanded out of Africa and subsequently submixed with archaic Eurasian groups. After that, populations grew rapidly, with extensive migration and mixing and, over the last 10,000 years, many groups transitioning from hunter-gatherers to food producers.
However, we’ve still much to learn about the extent to which population histories differed between continents and regions, and how this has shaped the modern distribution and structure of human genetic variation. As the researchers pointed out, “Large-scale genome-sequencing efforts to date have been restricted to large, metropolitan populations and used low-coverage sequencing, whereas those sampling human groups more widely have mostly been limited to one to three genomes per population.”
The Human Genome Diversity Project (HGDP)–Centre d’Etude du Polymorphisme Humain (CEPH) panel offers a resource to which several iterations of genetic assays have been applied, the researchers continued. For their reported studies, they used Illumina sequencing technology to generate 925 high-coverage genome sequences from 54 geographically, linguistically, and culturally diverse populations. Of these, 142 had previously been sequenced.
Their analyses found millions of previously unknown DNA variations that were exclusive to one continental or major geographical region. Although most of these were rare, they did include common variations in certain African and Oceanian populations that had not been identified by previous studies.
“We identified 67.3 million single nucleotide polymorphisms, 8.8 million small insertions or deletions (indels), and 40,736 copy number variants,” the authors noted in the print version of the report in Science. “This includes hundreds of thousands of variants that had not been discovered by previous sequencing efforts, but which are common in one or more population.” Interestingly, the researchers noted, this number of variations is nearly as many as the 84.7 million SNPs that were discovered in 2,504 individuals by the 1000 Genomes Project, which, they suggested, reflects the increased sensitivity due to the high-coverage sequencing, and the greater diversity of human ancestries covered by the HGDP-CEPH panel. Moreover, the authors stated, “While the vast majority of the variants discovered by one of the studies but not the other are very low in frequency, the HGDP dataset contains substantial numbers of variants that were not identified by the 1000 Genomes Project but are common or even high frequency in some populations: ~1 million variants at ≥20%, ~100,000 variants at ≥50%, and even ~1000 variants fixed at 100% frequency in at least one population sample.”
The findings could help to inform medical research, as variations uncovered may be found to influence the susceptibility of different populations to disease. Medical genetics studies to date have predominantly been conducted in populations of European ancestry, so any medical implications that some of these new variants might have are not yet known. Identifying these novel variants represents a first step towards fully expanding the study of genomics to underrepresented populations.
Interestingly, no single DNA variation was found to be present in 100% of genomes from any major geographical region while being absent from all other regions. This finding underscores that the majority of common genetic variation is found across the globe. The study also provides evidence that the Neanderthal ancestry of modern humans can be explained by just one major “mixing event,” most likely involving several Neanderthal individuals coming into contact with modern humans shortly after the latter had expanded out of Africa. In contrast, several different sets of DNA segments inherited from Denisovans were identified in people from Oceania and East Asia, suggesting at least two distinct mixing events. The discovery of small amounts of Neanderthal DNA in west African people, most likely reflecting later genetic backflow into Africa from Eurasia, further highlights how human genetic history is characterized by multiple layers of complexity. Until recently, it was thought that only people outside sub-Saharan Africa had Neanderthal DNA.
Commenting on the findings, co-author Anders Bergström, PhD, a postdoctoral fellow at the Francis Crick Institute, said, “The detail provided by this study allows us to look deeper into human history, particularly inside Africa where less is currently known about the timescale of human evolution. We find that the ancestors of present-day populations diversified through a gradual and complex process mostly during the last 250,000 years, with large amounts of gene flow between these early lineages. But we also see evidence that small parts of human ancestries trace back to groups that diversified much earlier than this.”
The authors noted that while the HGDP genome dataset substantially expands the genomic record of human diversity, it does also still have gaps in geographical, linguistic, and cultural coverage. They acknowledged the need for continued sequencing of diverse human genomes. “Given the scale of ongoing medical and national genome projects, producing high coverage genome sequences for at least 10 individuals from each of the ~7,000 human linguistic groups would now arguably not be an overly ambitious goal for the human genomics community.” Such an achievement, they stated, would represent a “scientifically and culturally important step toward diversity and inclusion in human genomics research.”
Hélène Blanché, head of the Biological Resource Centre at the CEPH in Paris, France, concluded, “The Human Genome Diversity Project resource has facilitated many new discoveries about human history in the past two decades. It is exciting to see that with the latest genomic sequencing technology, these genomes will continue to help us understand our species and how we have evolved.”