Pick and Choose
Sometimes sequencing the whole genome just gives more information than is really needed. If you want to focus on a particular set of genes or features in the genome, for example, it may be preferable to enrich for these ahead of time, saving the time, effort, and cost of a whole-genome analysis.
MYcroarray offers custom bait libraries designed to selectively capture DNA prior to sequencing. MYbaits target-enrichment kits can contain from hundreds to millions of 80 to 120 mer-long biotinylated RNA bait sequences. These are incubated together in liquid phase with the genomic sample for 36 hours, after which bound sample is pulled out of solution on streptavidin-coated magnetic beads. The baits are then degraded, leaving behind the positively selected DNA that can be used directly for, or amplified prior to, sequencing.
Bait sequences can be designed from genomic libraries or transcriptomes of the same or related species. “Say you’re working on mammals. You can eventually design your baits from the human genome, and then go to fish sequence from more distant genomes—it usually works well down to 90% sequence similarity or so,” said Jean-Marie Rouillard, Ph.D., CSO and co-founder of MYcroarray.
MYbaits is useful for gaining insights into gene structure, even when only transcriptome data is available. From transcriptomic baits the introns flanking the expressed sequences can be pulled down as well. It becomes possible to map exon-intron junctions, as well as to identify noncoding transcriptional and regulatory elements. This, in turn, will allow many fundamental phylogenetic and population questions to be addressed, Dr. Rouillard notes.
MYcroarray presented data in which a set of 14,468 bait sequences designed from western terrestrial garter snake transcriptome sequences were used to enrich genomic DNA prior to PCR amplification and 454 sequencing. 2,556 reference contigs were obtained: in these, 615,338 bases mapped to reference transcripts while 2,161,616 bases mapped to adjacent sequences, meaning that for each base of transcriptome sequence used for baiting an average 3½ bases of new sequence was discovered.
Another group presenting at PAG, led by Aaron Liston, Ph.D., professor at Oregon State University, used MYbaits to enrich for more than 6,000 previously identified polymorphic sites in strawberry, prior to genotyping of 48 F1 offspring. The resulting high-density linkage map allowed them to gain insights into sexual transitions of these and related diploid plant species.
Let’s Get Physical
Rather than looking at only selected portions of the genome to get a handle on assembling sequences, other approaches add complementary information to help piece together the data acquired from massively parallel next-gen sequencing reads of short DNA fragments.
The optical mapping of chromosomes has been around since at least 1993, when David Schwartz, Ph.D., and his colleagues described imaging of fluorescently labeled DNA molecules that had been digested with restriction enzymes after being fixed in agarose. The relative fluorescence intensity gave a measure of the length of each restriction fragment, and this information was used to construct ordered physical maps of the chromosomes from which they came.
OpGen took Dr. Schwartz’ technique into the realm of microfluidics, allowing whole-genome mapping (WGM) to be performed on its automated Argus platform. The Argus generates long single-molecule maps from 250 kilobases up to 2.5 megabases—giving “a more global picture,” and hence a more accurate analysis of structural variation such as indels, inversions, and translocations, said field application scientist Erin Newburn, Ph.D.
Smaller genomes can be mapped completely de novo by assembling the high-density restriction pattern repeats. But for larger genomes such as found in plants and animals, the company has introduced Genome-Builder™, a bioinformatics module “basically utilizing our very long restriction mapping leads of the WGM system and combining it with the sequencing scaffolds that our customers have assembled through their next-gen sequencing platforms,” she explained.
The genome sequence is converted to an in silico restriction map while at the same time long DNA molecules are isolated and cut by the Argus system. “Then we target the ends of these in silico maps and align up our single molecule restriction maps,” Dr. Newburn explained.
A subset of these will start to extend off the in silico map. After several iterations of alignments, extensions, and gap-bridging, “we can take these extended scaffolds (hybrid molecules, if you will) and then see if we can use traditional genetic information to connect the scaffolds and build what we’re calling a superscaffold.”
Dr. Newburn mentioned that a December article in Nature Biotechnology discussed the sequencing and mapping by a Chinese group of the domestic goat genome. By combining NGS with WGS they were able to significantly reduce the number of scaffolds, and they saw the N50 (a statistical measure of average length of the set of sequences) go from 3 megabases to 16.3 megabases—an improvement of more than fivefold, she related. “Many of the superscaffolds were actually on the order of the length of a full chromosome.”