Africa has a high burden of chronic kidney disease, with prevalence rates much higher than in developed countries. Many patients don’t even realize they are living with the disease. But when Segun Fatumo, PhD, head of non-communicable diseases genomics at MRC Uganda, and his team wanted to look into the genetic drivers behind this high prevalence, they realized that there was insufficient genetic or kidney function data across the entire continent.
When they managed to conduct a genome-wide association study (GWAS) linking Africa-specific genetic variants associated with kidney function,1 it was statistically underpowered with only 3,000 Ugandan genomes. By comparison, a previous GWAS of kidney function, comprising mostly those with European ancestry, sampled genomes of a million individuals. Fatumo, who is also chair of genomic diversity at Queen Mary University of London, said that this illustrated the genome data disparity to him. That disparity has only worsened.
“In 2016, the proportion of African genomes in genomics studies was 3%. I saw that decline to 1.1% by 2022,” Fatumo told GEN Biotechnology. Of these, most are based on studies done on African Americans, with the 1.5 billion people living in continental Africa severely underrepresented. Likewise, Asian, Hispanic, and Indigenous communities are also severely underrepresented in genomic studies, both within the United States and globally. This lack of diversity in genome datasets produces an incomplete picture of links between genes and diseases and prevents equitable delivery of genomic medicine.
Clinical relevance of genome diversity
The original reference human genome sequence, published in draft form in February 2001, was a mosaic assembled from 20 anonymous volunteers who answered an ad placed in a Buffalo, NY, newspaper in 1997. However, almost 75% of the reference genome was sourced from one male volunteer, known as RP11, who is most likely African American. But since the genome era began, sampling genomes of European ancestry has dominated the genomic databases.
This overrepresentation is not just inequitable; it also leads to missed opportunities. Complex traits and many heritable conditions alike are shaped by the interaction between multiple genes. When genome datasets aren’t diverse, researchers miss out on clinically relevant genetic variants.
When a drug is developed to target a gene, it might be effective against only one genetic variant that might not be prevalent in other ethnicities. Conversely, studying other genomes could provide insights into the disease and what’s driving it. Instead of targeting gene variants identified by analyses of Eurocentric dataset, “we have an opportunity to find new pathways and new targets that we can develop drugs for,” said Cheng Zhang, CEO of Character Biosciences, a precision medicine company in San Francisco.
Character Biosciences performs genomic studies on diverse patient cohorts to develop therapeutics for conditions like age-related macular degeneration and glaucoma. It uses polygenic risk scores,2 a metric that quantifies the risk of developing a particular disease based on many genetic variants. As these scores are calculated based on datasets skewed toward European ancestry, they are often less useful to people of other ethnicities.
“Polygenic risk scores for age-related macular degeneration can predict disease risk and disease progression fairly well in patients of European descent,” said Zhang. “But if you take the same risk score and try to predict disease in patients of African ancestry, it may be significantly less effective.” This also means the same condition could have different underlying genetic drivers in different populations.
Claudia Gonzaga-Jauregui, PhD, a genomicist at the International Laboratory for Human Genome Research at the National Autonomous University of Mexico, said that studying genetic variants across populations tells us which variants could be affecting the function of genes or the proteins they codify. This is relevant to understanding how potential therapeutics could interact with variants of the same genes in different populations. If a disease has a low prevalence in a community, it could be because there are genetic modifiers that have a protective effect.
Lack of genome diversity means that even patients with access to genome sequencing, such as Hispanic individuals in the United States, might not always benefit from it. “Clinical laboratories have a hard time giving them a definite diagnosis because there are many variants that they will identify and label as variants of unknown significance,” said Gonzaga-Jauregui. This is because public genome datasets lack data to determine whether a genetic variant is a polymorphism prevalent in the Hispanic population or a pathogenic variant.
Whole genome sequencing has accelerated rare disease research. By sequencing patient genomes, scientists have identified many previously undiagnosed rare diseases and a lot of them are often more prevalent in particular ancestral groups. Sequencing diverse genomes, therefore, is critical to identifying hidden rare diseases and determining their true prevalence.
For example, Steel syndrome, an ultra-rare orthopedic condition, affects Puerto Rican children. Two decades after it was first described, Gonzaga-Jauregui and colleagues identified the gene associated with the disease3 by sequencing a family with two affected children. The study also found that the gene variant is a founder mutation in the Puerto Rican population.
In recent years, cell and gene therapies have been used to treat a range of conditions, including lymphomas and sickle cell disease. However, the risk of off-target effects is a concern with therapeutic gene therapies and these effects may be gene variant-specific. A 2022 study published in Nature Genetics led by Daniel Bauer at Boston Children’s Hospital, showed that a single nucleotide polymorphism more prevalent among the African-American community contributed to a putative off-target of Casgevy, a cell therapy for sickle cell anemia.4 Diverse genomes could be key to improving the success of gene therapies.
The benefits of diversifying genome datasets extend beyond underrepresented groups. African and South Asian populations have high genomic diversity and a high frequency of rare variants. In a 2017 paper, also published in Nature Genetics, researchers found strong founder effects in South Asian populations5 which, exacerbated by caste endogamy, cause high rates of recessive diseases. Sampling this diversity could increase the predictive power of genome-wide association studies as well as help researchers identify causal gene variants and their mechanisms.
Bridging the genome diversity gap
Recognizing the need for diverse genome datasets, scientists are pushing to diversify cohorts of genome sequencing efforts in developed countries. For example, nearly half of the roughly 250,000 genomes released as part of NIH’s All of Us initiative (Figure 1)6 encompasses individuals from underrepresented racial and ethnic minorities. Researchers working on the project have discovered a total of some 275 million new gene variants.
Developing countries are also undertaking large-scale genome sequencing initiatives, such as the Nigerian 100K Genome Project.7 The project will catalog genetic variation in 100,000 Nigerian adults and assess the burden and etiology of non-communicable diseases like cancers and cardiometabolic disorders.
Biotech companies focused on underrepresented groups are also diversifying genomics research. These include genetic testing companies like Bermuda-based CariGenetics and U.S.-based IndyGeneUS AI.
CariGenetics recently conducted a whole genome study that was the first to be done entirely in the Caribbean. The study found that 70% of the patients were diagnosed with breast cancer under the age of 50. While this suggests a genetic link, “we found only three patients had a [marker] gene that we knew,” said CariGenetics CEO Carika Weldon, PhD. This means Caribbean women could have the disease without expressing the standard markers for breast cancer screening. “We’re finding there are some potential markers linked to breast cancer in our population,” Weldon added.
IndyGeneUS AI, on the other hand, is sequencing multiple underrepresented groups in the United States. CEO Yusuf Henriques stated that “our goal is to sequence enough human genomes of African, Hispanic, and Asian descent to have a better reference library that includes some of the gene variants that are missing. Those could be included in genetic testing in the future.”
Indigenous communities are also missing from most genomic studies. “Aboriginal people will tell you that we feel like we’re the most researched people in the world,” said Azure Hermes, the deputy director at the National Centre for Indigenous Genomics (NCIG) of the Australian National University. “But when it comes to genomics, we are the most underrepresented,” added Hermes, an Indigenous woman of the Gimuy Walubara Yidinji tribe.
NCIG has access to nearly 7,000 historical specimens, collected between the 1960s and 1990s from Indigenous communities in Australia. In a study sequencing genomes of 159 individuals from four communities, researchers found significant variation8 never seen before in reference genomes or clinical datasets anywhere in the world.
While these data enrich knowledge for everyone, this must translate to clinical benefits for the people whose data it is. “We have high rates of end-stage kidney disease where 90% of people end up on renal dialysis,” added Hermes. NCIG researchers are studying how gene variation contributes to kidney diseases in Indigenous communities.
Sequencing genomes of rare disease patients yields many new variants and provides more patients with a diagnosis for their conditions. “When we see some rare diseases are more prevalent in certain regions or certain Indigenous populations, we can start screening some of the children and families there to identify those at risk,” said Gonzaga-Jauregui.
Scientists interviewed for this story, including Fatumo, Gonzaga-Jauregui, and Hermes, all stressed that African, Hispanic, or Indigenous people, respectively, are not monolithic identities. There are hundreds of different ancestral groups within each ethnicity. Therefore, projects to increase diversity must go beyond meeting a quota of non-European genomes and have representation from as many communities as feasible within each group.
Researchers benchmark sequenced genomes with the human reference genome. However, a single genome doesn’t represent the diversity of human populations. In 2023, the Human Pangenome Reference Consortium released the first draft human pangenome reference.9 The pangenome reference puts together genomes of 47 individuals with diverse ancestries, slated to increase to 350 this year, in a graph that captures the variation between them (Figure 2).
Toward equitable and inclusive genomic medicine
With the discovery of hundreds of millions of new genetic variants and their implications for drug development and precision medicine, it’s clear that there is immense commercial value in diverse genome data. Consequently, the pharma industry is looking to advance and benefit from genome diversity with initiatives like the Alliance for Genomic Discovery, which includes companies like AstraZeneca and Merck.
However, diversity should be complemented by equity and inclusion. “Researchers should return that information to patients and populations of interest, otherwise it is just extractive helicopter research that just takes samples out and no benefits come back to those countries and those populations,” said Gonzaga-Jauregui.
Owing to historical harm in genomics, as well as medicine broadly, many underrepresented communities are wary of participating in genomics research. “We don’t have a lot of Indigenous people that are willing to provide blood samples or samples or anything along those lines or wanting to be a part of big biobanks,” said Hermes. Even when there are Indigenous samples in genome datasets, Hermes added, it’s highly likely that Indigenous people don’t know about or have not provided informed consent. This is likely true of historical samples in NCIG’s collection. Hermes located the families of the individuals and sought their consent retrospectively.
Informed consent is necessary not just when samples are collected but for every subsequent use of the data generated. In a 2021 commentary in the American Journal of Bioethics, researchers warned10 that “without explicit ownership of their data, Indigenous communities not only lose out on the potential to participate in this economic activity, but risk having their identities misrecognized, commodified, and sold as ancestry tests.”
Hence the need for new models of benefit sharing that ensure that, if the genome data is monetized, individuals earn too. “The patients are owners and not donors of the data,” said Henriques. IndyGeneUS is building a blockchain platform where patients can upload their genome sequencing data, including from third-party genetic testing companies. The patients will know where their data has been shared and be able to revoke the permission in the future if they decide. CariGenetics has a similar solution, storing genome data as a non-fungible token. “We have a policy that data autonomy and ownership has to be not just within the region but also with the individual,” said Weldon.
Building trust in genomic medicine
As many cases of vaccine hesitancy around the world show, the best medicine isn’t worth much if people cannot or are not willing to engage with it. “Indigenous communities understand that we need to be a part of research to have a better understanding of diseases and medication. But it has to be done on our terms,” said Hermes. She added that long-term investment, relationship building, and trust are essential.
Additionally, researchers must look to include local communities as not just study subjects but also involve them in the research. As Jennifer Adair, PhD, a gene therapy researcher at the University of Washington, and her colleagues wrote earlier this year in Science Translational Medicine, “If a system does not develop where meaningful input from community members occurs at every step of research and clinical studies we will be left repeating the same mistakes of the previous five decades.”11
If more people trust and engage with genomic studies, researchers would have access to a far greater diversity of genomes. It would improve the diagnosis, treatment, and care of treatments for populations globally and, consequently, enhance trust. Getting there requires honest communication about the potential health benefits of genomics studies and their time frames.
Sachin Rawat is a freelance writer based in India. This article was originally published in GEN Biotechnology (the sister peer-review journal to GEN magazine) in the August 2024 issue.
References
- Fatumo S, Chikowore T, Kalyesubula R, et al. Discovery and Fine-Mapping of Kidney Function Loci in First Genome-Wide Association Study in Africans. Human Molecular Genetics 2021;30(16):1559–1568; doi: 10.1093/hmg/ddab088.
- Torkamani A, Wineinger NE, Topol EJ. The Personal and Clinical Utility of Polygenic Risk Scores. Nat Rev Genet 2018;19(9):581–590; doi: 10.1038/s41576-018-0018-x.
- Gonzaga-Jauregui C, Gamble CN, Yuan B, et al. Mutations in COL27A1 Cause Steel Syndrome and Suggest a Founder Mutation Effect in the Puerto Rican Population. Eur J Hum Genet 2014;23(3):342–346; doi: 10.1038/ejhg.2014.107.
- Cancellieri S, Zeng J, Lin LY, et al. Human Genetic Diversity Alters Off-Target Outcomes of Therapeutic Gene Editing. Nat Genet 2022;55(1):34–43; doi: 10.1038/s41588-022-01257-y.
- Nakatsuka N, Moorjani P, Rai N, et al. The Promise of Discovering Population-Specific Disease-Associated Genes in South Asia. Nat Genet 2017;49(9):1403–1407; doi: 10.1038/ng.3917.
- The All of Us Research Program Genomics Investigators. Genomic Data in the All of Us Research Program. Nature 2024;627(8003):340–346; doi: 10.1038/s41586-023-06957-x.
- Fatumo S, Yakubu A, Oyedele O, et al. Promoting the Genomic Revolution in Africa through the Nigerian 100K Genome Project. Nat Genet 2022;54(5):531–536; doi: 10.1038/s41588-022-01071-6.
- Silcocks M, Farlow A, Hermes A, et al. Indigenous Australian Genomes Show Deep Structure and Rich Novel Variation. Nature 2023; doi: 10.1038/s41586-023-06831-w.
- Liao W-W, Asri M, Ebler J, et al. A Draft Human Pangenome Reference. Nature 2023;617(7960):312–324; doi: 10.1038/s41586-023-05896-x.
- Tsosie KS, Yracheta JM, Kolopenuk JA, et al. We Have “Gifted” Enough: Indigenous Genomic Data Sovereignty in Precision Medicine. American J Bioethics 2021;21(4):72–75; doi: 10.1080/15265161.2021.1891347.
- Olayiwola O, Castillejo A, Louella M, et al. Nothing about Us without Us: Advocacy and Engagement in Genetic Medicine. Sci Transl Med 2024;16(746); doi: 10.1126/scitranslmed.adn2401.