Next-generation sequencing technologies have catapulted the field of genomics back into the spotlight thanks to fast, accurate, and historically low-cost sequencing methods. As of today, the cost of sequencing an entire human exome is pushing below the $1,000 mark—putting the promise and benefits of personalized medicine within the reach of many individuals. But how do we deliver on the promise of being able to make healthcare decisions tailored to an individual’s genetic makeup when a typical exome analysis identifies upwards of 10,000 sequence variants, and a typical whole-genome analysis identifies upwards of 3 million variants?
Common variant analysis strategies rely on a reduction-of-numbers approach. Millions, and even thousands, of variants are significantly more than the human brain can make sense of, so we instinctively seek to reduce the number of variants to focus on for detailed study as much as possible.
In an effort to reduce the number of variants that require manual human follow-up, the scientific literature abounds with examples where those variants obtained by next-generation sequencing that overlap with SNPs identified by the 1,000 genomes project or NCBI’s dbSNP database are manually filtered out in order to reduce the overall pool of variants. This strategy is true for both rare inherited disease as well as cancer studies (Figure 1).
Such a strategy undoubtedly makes sense when there are limited mechanisms available with which to identify those variants that are likely to cause a biologically relevant effect. But when such a mechanism is available, following a purely reductionist approach introduces the risk of unintentionally filtering out relevant, pathogenic variations.