“We started off at a time when in solution target enrichment was not yet available, and relied, initially, on array-based enrichment,” says Edwin Cuppen, Ph.D., professor of genome biology and human genetics at the Hubrecht Institute, KNAW.
Array-based capture is technically laborious, and this is one of the reasons why the field has increasingly moved toward solution-based enrichment. “The weakest point from a diagnostic perspective is when certain bases are not covered well, and for this reason, we found microarrays to be much better for enrichment, because they provide higher and more even coverage,” explains Dr. Cuppen.
In an attempt to scale microarray-based enrichment, Dr. Cuppen and collaborators barcoded the index samples prior to sequencing and enrichment. “Instead of several enrichments in parallel, we wanted to see if we can mix the samples together and perform one enrichment,” explains Dr. Cuppen, who with his group illustrated the feasibility and the strength of this approach.
Target enrichment of individually multiplexed barcoded samples was performed with Life Technologies’ Applied Biosystems SOLiD-based next-generation sequencing technology in a single assay. This approach enabled coverage of the complete coding sequence of 770 genes from a 1.4 Mb genomic region, and identified new variants with over 96% sensitivity, while the false positive rate remained lower than one in eight Mb.
More recently, Dr. Cuppen and colleagues also illustrated the feasibility of this strategy for solution-based target enrichment, and successfully used this highly flexible and scalable setup for a wide range of multiplexing applications.
In 2007, Richard A. Gibbs, Ph.D., professor and director of the human genome sequencing center at Baylor College of Medicine, together with colleagues from Roche NimbleGen published one of the first reports on the use of solid-based hybridization based enrichment of human genomic regions by programmable custom high-density oligonucleotide microarrays. “We have incrementally improved our reagents since then,” says Dr. Gibbs.
Most recently, Dr. Gibbs and colleagues described a liquid-phase hybridization platform that uses biotinylated oligonucleotide probes, and introduced additional design changes to include more genes than the narrow consensus coding DNA sequence (CCDS) set, which frequently guides the design of custom probes but excludes many computationally predicted or actual coding exons present in other databases.
To expand the regions examined during target enrichment, the investigators included two new reagents. The first one, VCR-set, includes microRNAs, Vega (the Vertebrate Genome Annotation Database), CCDS, and the RefSeq databases. The second capture design reagent, REC-set, additionally includes regulomes, exons, and conserved elements. By using these reagents, Dr. Gibbs and his team conducted the first genome-wide targeted capture analysis of a diverse set of biologically relevant genomic elements, and revealed decreased capture of variants located outside the CCDS regions as compared to the CCDS exome.
The results also showed that conserved untranslated regions, which are approximately 30% GC rich, and regulatory regions, which are approximately 70% GC rich, had approximately half of the depth sequence coverage following the capture procedure when compared to the CCDS regions, demonstrating the need to increase coverage in genomic regions that are different from CCDS.
At the biological end, Dr. Gibbs and colleagues are applying these advances toward the discovery of disease alleles and the study of rare genetic variants. “Having illustrated the robustness of this approach in the research arena, we are on the verge of developing this into a diagnostic test in the clinical arena.”
Solution-based hybridization approaches are often more convenient than solid-phase arrays, and offer the additional advantage of being easier to multiplex. “An important improvement from our point of view, particularly as we have been using the Illumina technology for next-generation sequencing, is that we are using TruSeq, the new library preparation system that Illumina has developed,” says Ann-Christine Syvänen, Ph.D., professor of molecular medicine at Uppsala University.
While many sequencing efforts focus on capturing exomes, a significant amount of genetic variation occurs outside protein-coding regions. “It would be important to also analyze other genomic regions, and the new enrichment probes from Illumina contain some extra sequences in regulatory regions very close to genes, which add information content in addition to multiplexing.”
These short regions that flank the genes allow the detection of regulatory variants located in their vicinity. In addition, genomic variation may also come from gene regulatory elements located further off from open-reading frames. This source of variation is also functional, but it will be missed by standard exome arrays.
Target enrichment and whole-genome sequencing emerge as two equally important strategies, and each of them is best powered to address specific biological questions. As these approaches are incrementally improved, optimized, and validated in research settings, they promise to materialize into exciting diagnostic and therapeutic applications and to provide powerful tools to interrogate other biological questions.