The human body is host to trillions of bacteria, fungi, viruses, and other microorganisms that make up the human microbiome. Researchers at Harvard Medical School (HMS) and Joslin Diabetes Center have now analyzed the genetic makeup of bacteria in the human gut, and linked groups of bacterial genes—genetic signatures—to disorders including atherosclerotic cardiovascular disease (ACVD), cirrhosis of the liver (CIRR), inflammatory bowel disease (IBD), colorectal cancer (CRC), and type 2 diabetes (T2D).

Data from the gene-level microbiome-disease association study indicated that coronary artery disease, inflammatory bowel disease, and liver cirrhosis share many of the same bacterial genes. In other words, people with gut microbiota that contain the same collections of bacterial appear more likely to have one or more of these three conditions.

The team says the work adds new understanding to what is already known about the relationship between the gut microbiome and specific diseases. If confirmed through further research, the results could inform the design of tools for gauging a person’s risk for a range of conditions, based on analysis of a single fecal sample.

The researchers do caution that their study was not designed to determine how and why these microbial genes may be linked to different diseases. Thus far, they said, it remains unclear whether these bacteria are involved in disease development or are mere bystanders in this process. Nevertheless, as Braden Tierney, a graduate student in the Biological and Biomedical Sciences program at HMS, stated, “This opens a window for the development of tests using cross-disease, gene-based indicators of patient health … We’ve identified genetic markers that we think could eventually lead to tests, or just one test, to identify associations with a number of medical conditions.”

Tierney is first author of the team’s published paper in Nature Communications, which is titled “Gene-level metagenomic architecture across diseases yield high-resolution microbiome diagnostic indicators.”

The ecology of the human microbiome is known to be associated with both phenotype and environment, the authors wrote. Previous studies have linked the mix of resident bacteria, and the presence of specific bacterial species, with conditions ranging from obesity to multiple sclerosis. The goal of the newly reported study was to determine whether groups of bacterial genes, rather than the species themselves, could reliably indicate the presence of different diseases.

The researchers started out by collecting microbiome data from 13 groups of patients totaling more than 2,500 samples. Next, they analyzed the data to pinpoint linkages between seven diseases and millions of microbial species, microbial metabolic pathways, and microbial genes. By trying out a variety of modeling approaches—computing a total of 67 million different statistical models—they were able to observe those microbiome features that consistently emerged as the strongest disease-associated candidates.

The team found that of all the various microbial characteristics—species, pathways, and genes—microbial genes had the greatest predictive power. In other words, the researchers said, groups of bacterial genes, or genetic signatures, rather than merely the presence of certain bacterial families, or individual bacterial genes were linked most closely to the presence of a given condition.

The results indicated that while coronary artery disease, inflammatory bowel disease, and liver cirrhosis had similar gut microbiome genetic signatures, type 2 diabetes, was associated with a microbiome signature unlike any other phenotype tested. “Overall, we found striking and previously unrecognized high resolution genetic and taxonomic signatures associated with ACVD, IBD, CRC, and cirrhosis …” the authors noted.

Interestingly, the analysis did not find a consistent link between presence of the bacterial species Solobacterium moorei and colon cancer—an association that has previously reported in numerous studies. However, the researchers did identify particular genes from a S. moorei subspecies associated with colorectal cancer. This finding indicates that gene-level analysis can yield biomarkers of disease with greater precision and more specificity compared with current approaches.

Co-senior study author Chirag Patel, PhD, associate professor of biomedical informatics in the Blavatnik Institute at HMS, suggested that this result underscores the notion that it is not merely the presence of a given bacterial family that may indicate risk, but rather it is the strains and gene signatures of the microbes that matter.

The ability to identify interconnections with such precision will be critical for designing tests that can measure risk reliably, he added. So, for example, a test intended to measure colon-cancer risk by merely detecting the presence of S. moorei in the gut may not be as reliable as a more refined test that measures bacterial genes to detect the presence of specific strains of S. moorei that are associated with colon cancer.

“Our gene-level architecture analysis captured a previously undocumented strain-level exploration of pan-disease-associated microbes,” Tierney and colleagues reported. “We additionally find that at the species-level, the prior-reported connection between Solobacterium moorei and colorectal cancer is not consistently identified across models—however, our gene-level analysis unveils a group of robust, strain-specific gene associations.” Patel added, “Our study underscores the value of data science to tease out complex interplay between microbes and humans.”

In contrast with highlighting positive associations between gut microbiome gene signatures and some diseases, the study data also indicated that two conditions, ear inflammation (otitis) and adenomas—benign soft-tissue tumors—showed only weak associations with the gut microbiome, suggesting that microorganisms residing in the human gut are not likely to play a role in the development of these conditions, nor are they likely to be reliable indicators that these conditions are present. “We specifically chose to examine otitis as a form of negative biological control, as, to our knowledge, it has limited reported association with the gut microbiome, and we expected it to have a negligible metagenomic architecture,” the investigators noted.

In a previous study, the HMS team used massive amounts of publicly available DNA-sequencing data from human oral and gut microbiomes to estimate the size of the universe of microbial genes in the human body. The analysis revealed that there may be more genes in the collective human microbiome than stars in the observable universe. Given the sheer number of microbial genes that reside within the human body, the new findings represent a major step forward in understanding the complexity of the interplay between human diseases and the human microbiome, the researchers said.

The newly identified microbial genetic signatures could be studied further to determine what role, if any, the organisms play in disease development. “Overall, our work is not only a step towards gene-based, cross-disease microbiome diagnostic indicators, but it also illuminates the nuances of the genetic architecture of the human microbiome, including tension between gene- and species-level associations,” the team wrote in their paper. “Focusing on the gene level may have an additional practical advantage over analysis at the species- or pathway level in the clinic: it allows for high-throughput, multiplexed, PCR-based, and specific diagnostics.”

“The ultimate goal of computational science is to generate hypotheses from a huge swath of data,” said Tierney. “Our work shows that this can be done and opens up so many new avenues for research and inquiry that we are only limited by the time, people, and resources needed to run those tests.”

The authors concluded, “Overall, this work depicts a path for researchers for moving microbiome associations from the abstract to the robust. In short, fitting and reporting a single model is simply not sufficient. However, if we are able to identify robust-to-specification associations that reproduce across cohorts, we will increase the efficiency of biomedical experiments.”