Thousands of genome-wide association studies (GWAS) for hundreds of diseases have identified candidate regions of interest, but have yet to identify the underlying causative genetic variants. What has hindered these projects is the lack of a rapid, affordable method that allows for 100 kilobases to 30 megabases of the human genome to be resequenced across a large-enough sample population to identify all genetic variations that may contribute to a specific disease.
Although whole human genome sequencing has been demonstrated using next-generation sequencing technologies, it is neither practical nor feasible for laboratories outside of major genome centers. Likewise, it is not practical to perform long-range PCR to target hundreds of kilobases, let alone megabase regions of the genome.
NimbleGen Sequence Capture arrays (Roche Applied Science), combined with 454 Life Sciences next-generation sequencing, simplifies the resequencing workflow (Figure 1) while reducing overall project costs. With this combination of technologies, researchers can completely resequence candidate genes and regions previously determined through GWAS, identifying all genetic variants that may be present. These genetic variations include single nucleotide polymorphisms, and insertions and deletions of nearly any size.
When using this sequence capture method to reduce the complexity of the genome, it is important to note that optimal performance is a result of a combination of technologies, where the sequence-capture technology enriches the sample and the sequencing platform serves as the detection device. As such, several important criteria must be evaluated:
- Amount of genomic material needed per enrichment: Some methods require more than 20 to 30 micrograms, while the optimized protocol using NimbleGen Sequence Capture arrays and the 454 Sequencing System requires only five micrograms of genomic DNA.
- Percent of targeted region that is detected using sequencing reads.
- Uniformity of sequencing coverage across the targeted region: The goal is to achieve the same level of enrichment across the region, minimizing regions that are overly represented and ensuring that there are as few gaps as possible.
- Required sequencing coverage to detect genetic variations: This refers to the number of sequencing reads across a specific position to determine the level of variation.
Two different NimbleGen Sequence Capture Custom Delivery arrays are available: a 385K probe array that can target 100 kilobases to 5 megabases of genomic regions, and a larger 2.1M feature array that targets 5 to 30 megabases of genomic DNA sequence.
The NimbleGen Sequence Capture 2.1M Human Exome array targets over 180,000 exons as defined by CCDS (Consensus Coding Sequence, build 36.2), and 551 microRNA genes. This exome sequence capture array has been used to support population studies, cancer disease models, and Mendelian disease studies; researchers can also use it to quickly identify most genetic variations present within the coding portion of the genome.
The NimbleGen Custom Delivery and 2.1M Human Exome arrays are generated using design algorithms that result in uniform binding properties, specificity to the target regions, and optimal number of probes per region for a homogeneous enrichment.
Analysis software is a key component of the NimbleGen and 454 Life Sciences optimized sequence-capture method. The GS Reference Mapper, a dedicated software application, automatically aligns the sequencing reads and provides a tabulated summary of the genetic variations identified, relative to a reference sequence; this analysis takes only a few hours.