A worldwide consortium of scientists, led by the Earlham Institute and the University of Liverpool, has developed an efficient, inexpensive approach to large-scale bacterial genome sequencing that could equip researchers in low- and middle-income countries (LMICs) with cheap and accessible methods for sequencing large collections of bacterial pathogens—at a cost of less than $10 per genome.
The researchers suggest that at a time when global genomic surveillance of coronavirus has been in the spotlight, the ability of countries to contribute through low-cost and rapid whole genome sequencing (WGS) has become increasingly important. The newly reported methods could be applied to large collections of bacterial pathogens and help to strengthen global research collaborations to tackle future pandemics.
“It has been 26 years since the first bacterial genome was sequenced, and it is now possible to sequence bacterial isolates at scale,” said Neil Hall, PhD, Earlham Institute director. “However, access to this game-changing technology for scientists in low- and middle-income countries has remained restricted. The need to ‘democratize’ the field of pathogen genomic analysis prompted us to develop a new strategy to sequence thousands of bacterial isolates with collaborators based in many economically challenged countries.”
The team reported on its approach in Genome Biology, in a paper titled, “An accessible, efficient and global approach for the large-scale sequencing of bacterial genomes.”
Over the past decade, WGS has revolutionized our understanding of microbial diseases, the authors stated. WGS data can be used for surveillance, functional genomics, and the exploration of pathogen evolution, prompting both public health and research scientists to adopt genome-based approaches. “Recognizing the immense advantages that WGS data provides for surveillance, functional genomics, and population dynamics, both public health and research communities have adopted genome-based approaches,” the team stated.
The demand for sequencing human genomes has taken the cost of sequencing reagents down to under $1000 per sample, the authors continued. However, while the need to genome sequence collections of key pathogens has grown substantially in recent years genome sequencing thousands of microorganisms has remained expensive—“ … largely due to costs associated with sample transportation and library construction,” the scientists commented.. And until recently, large-scale bacterial genome projects could only be performed in a handful of sequencing centers around the world. The researchers’ aim was to make such technology accessible to laboratories worldwide.
Focusing on the organism Salmonella enterica, a pathogen with a global significance that causes infection and deadly disease, the large-scale genomic sequencing initiative was led by the worldwide 10,000 Salmonella genomes research consortium (10KSG). The 10KSG consortium involved collaborators from 25 institutions and research and reference laboratories across 16 countries. “Limited funding resources led us to design a genomic approach that ensured accurate sample tracking and captured comprehensive metadata for individual bacterial isolates while keeping costs to a minimum for the Consortium,” said Hall. “The pipeline streamlined the large-scale collection and sequencing of samples from LMICs.” As the authors continued, “A key driver was to assemble a set of genomic data that would be as informative and robust as possible.”
Non-typhoidal Salmonella (NTS) have been widely associated with enterocolitis in humans, a zoonotic disease that is linked to the industrialization of food production, the authors noted. Due to the scale of human cases of enterocolitis and concerns related to food safety, more genome sequences have been generated for Salmonella than any other genus.
“In recent years, new lineages of NTS serovars Typhimurium and Enteritidis have been recognized as common causes of invasive bloodstream infections (iNTS disease), responsible for about 77,000 deaths per year worldwide,” the authors continued in their report. “Approximately 80% of deaths due to iNTS disease occurs in sub-Saharan Africa, where iNTS disease has become endemic.” The new Salmonella lineages responsible for bloodstream infections can be identified by genomics, due to gene degradation, altered prophage repertoires, and novel multidrug-resistant plasmids. As the team reported, “We saw a need to simplify and expand genome-based surveillance of salmonellae from Africa and other parts of the world, involving isolates associated with invasive disease and gastroenteritis in humans, and extended to bacteria derived from animals and the environment.”
The objectives of 10KSG are to make genomic data more accessible to LMICs. This is particularly relevant because mortality rates due to Salmonella in sub-Saharan Africa are exceptionally high. Scientists need to understand the genetic makeup of significant collections of such bacteria strains. As the researchers pointed out, “One of the most significant challenges facing scientific researchers in low- and middle-income countries is the streamlining of surveillance with scientific collaborations. For a combination of reasons, the regions associated with the greatest burden of severe bacterial disease have inadequate access to WGS technology and have had to rely on expensive and bureaucratic processes for sample transport and sequencing.” This has prevented the adoption of large-scale genome sequencing and analysis of bacterial pathogens for public health and surveillance in LMICs.
Hall commented, “The number of publicly available sequenced Salmonella genomes reached 350,000 in 2021 and are available from several online repositories. However, limited genome-based surveillance of Salmonella infections has been done in LMICs, and the existing dataset did not accurately represent the Salmonella pathogens that are currently causing disease across the world.” Study co-author Jay Hinton, PhD, University of Liverpool professor of microbial pathogenesis, further noted, “One of the most significant challenges facing public health researchers in LMICs is access to state-of-the-art technology. For a combination of logistical and economic reasons, the regions associated with the greatest burden of severe bacterial disease have not benefited from widespread availability of WGS. The 10,000 Salmonella genomes project was designed to begin to address this inequality.”
The newly reported approach aimed to streamline the large-scale acquisition and genome sequencing of bacteria, and the researchers amassed the genetic material of more than 10,400 clinical and environmental bacterial isolates from LMICs in under a year. In fact, members of the 10KSG provided access to 10,419 bacterial isolates sourced from 51 LMICs and regions—covering seven bacterial genera: Acinetobacter, Enterobacter, Klebsiella, Pseudomonas, Shigella, and Staphylococcus—coordinating the sample collection and transport of materials to be sequenced in the U.K.
The sample logistics pipeline, developed by a team at the University of Liverpool, was optimized by shipping the heat-inactivated bacterial isolates as “thermolysates” in ambient conditions from across the world to the U.K. Subsequently, isolates were sequenced at the Earlham Institute using the unique LITE protocol—a low cost, low input automated method for rapid genome sequencing. “A key aspect of our methodology was the involvement of researchers fluent in multiple languages in corresponding with collaborators, to maximize clear and continuous communication by email,” the researchers said.
In total, the gene library construction and DNA sequencing bioinformatic analysis was done with a total reagent cost of less than USD$10 (around £7.50GBP) per genome. Co-author Blanca Perez Sepulveda, PhD, postdoctoral research associate at the University of Liverpool, who led the global sample collection, optimization, and analysis, added, “The adoption of large-scale genome sequencing and analysis of bacterial pathogens will be an enormous asset to public health and surveillance in LMICs. Here, we have established an efficient and relatively inexpensive pipeline for the worldwide collection and sequencing of bacterial genomes.”
“We have established an efficient and relatively inexpensive pipeline for the worldwide collection and sequencing of bacterial genomes,” the authors wrote. “Our novel approach allows the transport and whole-genome sequencing of large collections of bacterial pathogens, by coupling the use of thermolysates with DNA extraction and sequencing using the innovative LITE pipeline for library construction. We evaluated this method with the model organism Salmonella enterica through worldwide research collaboration, generating 6,117 high-quality Salmonella genomes, which have already been used for a number of published studies.”
Darren Heavens, PhD, postdoctoral scientist at the Earlham Institute, who developed the whole-genome sequencing pipeline, further noted, “We saw the need to simplify and expand genome-based surveillance of salmonellae from Africa and other parts of the world, involving isolates associated with invasive disease and gastroenteritis in humans, and extending to bacteria derived from animals and the environment. Our pipeline represents a cost-effective and robust tool for generating bacterial genomic data from LMI countries, to allow investigation of the epidemiology, drug resistance, and virulence factors of isolates.”
As the authors explained, “Our pipeline represents a relatively inexpensive and robust tool for the generation of bacterial genomic data from LMI countries, allowing investigation of the epidemiology, drug resistance, and virulence factors of isolates … In future, the method will facilitate rapid, low-cost, and collaborative genome sequencing of bacterial pathogens. Our concerted approach demonstrates the value of true global collaboration, and could contribute to the future investigation of international epidemics or pandemics.”
The analytical bioinformatic pipeline and the resulting genomic data are publicly available at https://github.com/apredeus/10k_genomes and EMBL European Nucleotide Archive (ENA) repository under the project accession numbers PRJEB35182 and PRJEB47910, and presented as a data resource for the scientific community.