Nov 15, 2010
(Vol. 30, No. 20)
Solving the Next-Gen Sequencing Data Crunch
Eureka Genomics' Bioinformatics Platform Takes Aim at Computational Bottlenecks!--h2>
The foundation for Eureka Genomics was laid several years ago during a meeting sponsored by the U.S. Department of Homeland Security. Researchers were demonstrating their inventions when one presenter, Yuriy Fofanov, Ph.D., director of bioinformatics at the University of Houston, TX, demonstrated a bioinformatics platform. The technology immediately impressed Didier Perez, COO and CFO at Eureka Genomics. “We had never seen any bioinformatics system that could provide the answers we were looking for.”
Perez secured worldwide exclusive rights to the technology from the University of Houston and launched Eureka Genomics in 2007, along with Dr. Fofanov and Heather Koshinsky, Ph.D., CSO. The company, located in Hercules, CA, and Houston, TX, first offered bioinformatics services based on designing ultraspecific RNA and DNA signatures for companies developing diagnostics. The platform subsequently expanded into its Next Generation Bioinformatics Service.
The storage, manipulation, and analysis of massive quantities of sequence data cause computational bottlenecks and delay scientific progress. Eureka Genomics’ approach uses novel sets of data structures and algorithms to quickly process data in a nonheuristic manner, according to Perez.
Most other approaches such as search engines like BLAST rely on heuristic analysis to find approximate answers to questions. Heuristic approaches, however, are limited because they only identify simple alignments or lack of alignments in data, and do not guarantee that all possible matches are detected. “In order to detect insertions, deletions, and substitutions at any position in the length of a sequence or portion of a genome, you need a non-heuristic approach.”
Fifteen years ago, “biologists thought that longer sequence reads were better,” explains Dr. Koshinsky. But, Dr. Fofanov, a mathematician, took an early view of sequencing data that was 180 degrees opposite the biological one. He wondered why biologists were so obsessed with long reads. It didn’t make sense to the mathematician, who thought shorter reads would be more informative if analyzed correctly.
The bioinformatics technology proved computationally intensive, and it provides a thorough understanding of sequence data, including mapping, assembly, and novel sequence discovery. The technology, adopted by Eureka Genomics, allows scientists to ask what genes are more or less expressed, discover and sequence new species, or identify foreign DNA in complex samples. “Those types of experiments are more addressable with our robust analysis of sequence data,” Dr. Koshinsky says.
Eureka Genomics recently demonstrated that its technology is capable of generating data from nanogram quantities of genomic DNA from plant, animal, and microbial cells. Most protocols for today’s sequencing instruments call for microgram amounts of a sample. However, it may be difficult to extract this much genomic material from environmental, clinical, and forensic samples. “This is a breakthrough because it represents a 1,000-fold reduction in the amount of sample needed to generate sequence data for analysis,” explains Dr. Koshinsky.
Among other projects, the company worked with scientists at the University of California, Davis to characterize a previously undefined viral disease that threatens California vineyards. Known as Syrah Decline, the disease has baffled researchers for 20 years. Eureka Genomics isolated RNA from both infected and healthy grapevines, then compared about five million short reads from infected plants to two million short reads from uninfected plants. Sequences unique to the diseased plants were assembled and analyzed with BLAST searches and the GenBank database.
They discovered a mixed infection of Grapevine rupestris stem pitting-associated fovea virus strains that contains two novel members. The information could help to contain or eliminate Syrah Decline. Grapevines imported into California are routinely quarantined and tested for diseases, and the nucleic acid signatures of the newly identified disease organisms may be added to the protocol. The technology readily extends to other areas of agriculture, cleantech, and human disease.
In another collaboration, Eureka Genomics teamed up with Glycos Biotechnologies to sequence the genome of a bacterium used in the biorefinery industry. The new bacterial strain plays a key role in efforts at Glycos Biotechnologies to commercialize microbial strains to produce high-margin biochemicals from a variety of feedstocks and byproduct streams considered as low value or waste.
The bacterial strains will make ethanol produced from corn and other biomass more economically sustainable by providing benefits to the biorefinery industry, according to the firms. Glycos Biotechnologies reported that working with Eureka Genomics saved it two years of research time and money.
These same tools help researchers at Eureka Genomics to detect unknown pathogens that cause human illnesses. “We’re working on finding a virus suspected of causing colorectal cancer and cardiovascular disease,” says Didier. When disease-associated pathogens are detected, Eureka Genomics will create intellectual property related to screening and diagnostic tools, therapeutics, and vaccines in collaboration with appropriate companies.