Scientists at the University of Illinois at Urbana-Champaign and Northwestern University say they have demonstrated the value of an algorithm to analyze microbial genomic data and speed discovery of new therapeutic drugs.

A large proportion of the medications used today were discovered by screening bacteria and other organisms for their ability to produce natural products, biologically useful compounds. In recent years, pharmaceutical companies have largely abandoned this strategy in favor of screening synthetically created chemicals for useful properties, an area of research that has yielded a tiny number of new antibiotics.

Bill Metcalf, Ph.D., a leading investigator in the new study (“A roadmap for natural product discovery based on large-scale genomics and metabolomics”), which was published in Nature Chemical Biology, described the reason for pharmacological research’s shift away from the exploration of natural products. “There was a reason why they gave up.…They kept discovering the same things over and over and over again,” he said. “They were getting very diminishing returns.”

Genome sequence information, which is now available for an ever-increasing number of bacterial species, holds the promise to allow antibiotic hunters to find promising natural products. Part of the vision of the Institute for Genomic Biology’s Mining Microbial Genomes research group, led by Dr. Metcalf, is to use bacterial genome sequence data as an index of what products each one can produce.

If researchers could infer what type of product the bacterium is making by looking at its DNA, they wouldn’t have to go through a lengthy screening process—they could just scan genomes for promising gene clusters. Unfortunately, this task is much harder than it sounds. Many clusters have some sequences or whole genes in common, making them indistinguishable by traditional comparative methods even though they enable the production of different compounds.

Dr. Metcalf, co-lead author and Institute for Genomic Biology Fellow James Doroghazi, Ph.D., and colleagues cleared this hurdle with a clever computational solution: They combined multiple comparative metrics, each with a carefully calibrated weight, to produce an algorithm that sorted 11,422 gene clusters from 830 bacterial genomes into an orderly, searchable reference.

“The method also linked previously unassigned GCFs [gene cluster families] to known natural products, an approach that will enable de novo, bioassay-free discovery of new natural products using large datasets,” wrote the investigators.