Speed of Delivery
Once this core intellectual property market was established, GenomeQuest began designing tools to evaluate next-generation sequencing data, a task that previously took months to years to perform. Now with GenomeQuest’s web-based platform, researchers can perform identity and similarity comparisons between next-generation sequence data and reference databases.
GenomeQuest’s databases include GenBank, RefSeq, GQ-PAT, DrugBank Pro (containing drug targets, commercial drugs, and investigational new drugs), and GQ Gene (a high-resolution gene/transcript database). Search sequence annotation is done through text searching. “We organize databases for our customers,” Dr. McManus points out, “so that they can store, manage, and mine the world’s entire sequence data.”
GenomeQuest’s high-speed sequence search suite, called HS3, allows end users to focus on biological questions, rather than worrying about bootstrapping their own bioinformatics infrastructure. The workhorse behind HS3 is a high-speed, word-based algorithm that quickly identifies highly similar sequences. GenomeQuest recently completed a project involving 13 trillion pair-wise comparisons in less than 10 hours.
“We can do trillions of comparisons, then throw out what does not make sense,” says Dr. McManus. By intellectually keeping the “best hits,” GenomeQuest provides a useful service for genomics.
The algorithm has no read length limitation, and it readily deals with gaps and sequence reads from any sequencing instruments. The scalability of HS3 makes it perfect for performing all-against-all sequence comparisons. This includes all reads against all reference genomes for meta-genomic studies, all reads against a single reference genome for resequencing studies, all reads against a transcriptome to retrieve disease annotations, or all reads against a genome to retrieve gene annotations.
GenomeQuest also helps next-generation customers to perform in-silico QA/QC checks for sample contamination, annotate large data sets, and discover structural variations in SNPs.
Customers can run sequences from their desktop computers through a secure Internet subscription service called GenomeQuest Live. Using the Internet service, a 5-gigabyte FASTA file (the standard data file from next-generation sequencing machines) takes about 30 minutes to three hours to transfer, Dr. McManus says. For customers who do not want to send data over the Internet, GenomeQuest will install, in their facility, a preconfigured, preloaded Enterprise version of the bioinformatic platform that includes all the computing, networking, and storage resources needed.
GenomeQuest also analyzes data for customers who are referred to them by the manufacturers of next-generation sequencers. “We take reads from different manufacturers’ instruments, do the analysis, and send them an annotated, personal database,” Dr. McManus explains. One customer took 10 months to sort through data from a next-generation sequencing run using its own bioinformatic tools. When the same data set was sent to GenomeQuest, they performed the analysis in eight hours.
GenomeQuest has teamed up with Life Technologies and Illumina to develop bioinformatic tools for customers who use next-generation sequencers. It provides researchers performing next-generation sequencing operations with an alternative to building their own bioinformatics infrastructure, an expensive and time-consuming endeavor.