November 1, 2012 (Vol. 32, No. 19)
Jennifer Hogan, Ph.D.
Going Beyond One-Size-Fits-All Filtering Strategies
Next-generation sequencing technologies have catapulted the field of genomics back into the spotlight thanks to fast, accurate, and historically low-cost sequencing methods. As of today, the cost of sequencing an entire human exome is pushing below the $1,000 mark—putting the promise and benefits of personalized medicine within the reach of many individuals. But how do we deliver on the promise of being able to make healthcare decisions tailored to an individual’s genetic makeup when a typical exome analysis identifies upwards of 10,000 sequence variants, and a typical whole-genome analysis identifies upwards of 3 million variants?
Common variant analysis strategies rely on a reduction-of-numbers approach. Millions, and even thousands, of variants are significantly more than the human brain can make sense of, so we instinctively seek to reduce the number of variants to focus on for detailed study as much as possible.
In an effort to reduce the number of variants that require manual human follow-up, the scientific literature abounds with examples where those variants obtained by next-generation sequencing that overlap with SNPs identified by the 1,000 genomes project or NCBI’s dbSNP database are manually filtered out in order to reduce the overall pool of variants. This strategy is true for both rare inherited disease as well as cancer studies (Figure 1).
Such a strategy undoubtedly makes sense when there are limited mechanisms available with which to identify those variants that are likely to cause a biologically relevant effect. But when such a mechanism is available, following a purely reductionist approach introduces the risk of unintentionally filtering out relevant, pathogenic variations.
BIOBASE’s Genome Trax™ offering has been specifically developed to map individual variations in DNA sequence to functionally characterized nucleotide signatures described within the scientific literature. In addition to integrating public domain data sources such as the Catalogue of Somatic Mutations in Cancer (COSMIC) and significant findings from NHGRI’s catalog of genome-wide association studies (GWAS Catalog), Genome Trax integrates content from the Human Gene Mutation Database (HGMD®), a resource available for inherited disease-associated mutations.
Using HGMD, Genome Trax users can map their full set of identified variants to those specific sequence changes that have been experimentally demonstrated to result in a phenotypic effect without applying any filtering steps (Figure 2).
For those users who are interested in identifying novel sequence changes that are likely to result in a phenotypic effect of interest, Genome Trax provides an alternate workflow which allows users to instead filter out the HGMD-characterized disease-causing mutations to isolate novel frameshift and nonsynonymous mutations that are present within genes associated with a disease of interest—as determined by HGMD and PROTEOME™ disease gene assignments (Figure 2).
Additional data tracks provide further context such as pathway membership, drug target interaction, and sites of gene regulation.
The online interface makes it easy for bench scientists and genetic counselors to identify, within a matter of minutes, those subsets of characterized pathogenic mutations and candidate novel mutations represented in a dataset of sequenced variants. But for computational biologists developing their own custom variant analysis pipelines, a purely online solution can be too restrictive. In order to facilitate integration with other data sources and customized tools, Genome Trax data is made optionally available for download—providing ultimate freedom from one-size-fits-all solutions.