January 15, 2009 (Vol. 29, No. 2)
Firm Believes High-Throughput Module Serves as Alternative to Building Bioinformatics Infrastructure
High-throughput, next-generation sequencing instruments have dramatically accelerated the sequencing of genomes. They generate huge volumes of data, however, that challenge the limits of current bioinformatic tools and GenomeQuest couldn’t be happier about this predicament. The company has developed a high-powered bioinformatic platform that eliminates the IT burden of bioinformatics, thereby reducing the time it takes to analyze sequencing data from months to hours.
“Customers are overwhelmed by the amount of data they get from next-generation instruments,” says Michael McManus, Ph.D., vp and GM. “Our bioinformatic platform immediately unlocks the sequencing information and helps life scientists to understand the biology locked inside.”
The original product, called Biofacet, was a high-powered bioinformatic engine aimed at expert users. Jean Jacques Codani, Ph.D., CSO, created Biofacet while at Gene-IT in France in the late 1990s. In 2005, the company was renamed GenomeQuest.
Scientists at GenomeQuest improved the original search engine by developing a web-based software application for mining, managing, and sharing sequence data. The company has more than 150 customers, including patent lawyers and researchers at pharmaceutical firms, biotechnology companies, and academic laboratories.
GenomeQuest’s first product for the life sciences market was a complete do-it-yourself service for searching patented sequence information. This includes a module called the GQ High Throughput Module, where users can search and analyze thousands of sequences at once for patent portfolio analysis and business intelligence. The main customers of the module are legal professionals who work in the life sciences.
GenomeQuest provides a variety of search algorithms for protein and nucleic acid sequence homologies and similarities, including GenePAST. First developed in collaboration with Pfizer, this algorithm identifies subject sequences within a given percentage. The alignment found by GenePAST is guaranteed to be the longest one between two sequences that meets or exceeds a certain percent identity over its length, according to the company. Also included is GQ-PAT, a comprehensive archive of patented sequences that provides vital information such as sequence identification number, patent family, legal status, and bibliographic information and vital annotation information.
Speed of Delivery
Once this core intellectual property market was established, GenomeQuest began designing tools to evaluate next-generation sequencing data, a task that previously took months to years to perform. Now with GenomeQuest’s web-based platform, researchers can perform identity and similarity comparisons between next-generation sequence data and reference databases.
GenomeQuest’s databases include GenBank, RefSeq, GQ-PAT, DrugBank Pro (containing drug targets, commercial drugs, and investigational new drugs), and GQ Gene (a high-resolution gene/transcript database). Search sequence annotation is done through text searching. “We organize databases for our customers,” Dr. McManus points out, “so that they can store, manage, and mine the world’s entire sequence data.”
GenomeQuest’s high-speed sequence search suite, called HS3, allows end users to focus on biological questions, rather than worrying about bootstrapping their own bioinformatics infrastructure. The workhorse behind HS3 is a high-speed, word-based algorithm that quickly identifies highly similar sequences. GenomeQuest recently completed a project involving 13 trillion pair-wise comparisons in less than 10 hours.
“We can do trillions of comparisons, then throw out what does not make sense,” says Dr. McManus. By intellectually keeping the “best hits,” GenomeQuest provides a useful service for genomics.
The algorithm has no read length limitation, and it readily deals with gaps and sequence reads from any sequencing instruments. The scalability of HS3 makes it perfect for performing all-against-all sequence comparisons. This includes all reads against all reference genomes for meta-genomic studies, all reads against a single reference genome for resequencing studies, all reads against a transcriptome to retrieve disease annotations, or all reads against a genome to retrieve gene annotations.
GenomeQuest also helps next-generation customers to perform in-silico QA/QC checks for sample contamination, annotate large data sets, and discover structural variations in SNPs.
Customers can run sequences from their desktop computers through a secure Internet subscription service called GenomeQuest Live. Using the Internet service, a 5-gigabyte FASTA file (the standard data file from next-generation sequencing machines) takes about 30 minutes to three hours to transfer, Dr. McManus says. For customers who do not want to send data over the Internet, GenomeQuest will install, in their facility, a preconfigured, preloaded Enterprise version of the bioinformatic platform that includes all the computing, networking, and storage resources needed.
GenomeQuest also analyzes data for customers who are referred to them by the manufacturers of next-generation sequencers. “We take reads from different manufacturers’ instruments, do the analysis, and send them an annotated, personal database,” Dr. McManus explains. One customer took 10 months to sort through data from a next-generation sequencing run using its own bioinformatic tools. When the same data set was sent to GenomeQuest, they performed the analysis in eight hours.
GenomeQuest has teamed up with Life Technologies and Illumina to develop bioinformatic tools for customers who use next-generation sequencers. It provides researchers performing next-generation sequencing operations with an alternative to building their own bioinformatics infrastructure, an expensive and time-consuming endeavor.