Scientists at the University of California, San Francisco (UCSF) and BioNano Genomics have demonstrated use of the latter’s nanochannel array platform to carry out structural assembly and haplotype-resolved variation analysis of the human major histocompatibility complex (MHC), without the need for DNA fragmentation or amplification. The work, detailed in Nature Biotechnology, utilized BioNano Genomics’ scalable chip-based Irys™ platform.
The BioNano Genomics’ technology essentially prompts long molecules of intact, labeled DNA to spontaneously uncoil and stretch out in thousands of linear nanochannels. These stretches of DNA—one per channel—are then automatically imaged. Analysis of the data generates sequence motif maps that can be used to reconstruct the target piece of genome, which in the reported case was the 4.7 Mb human MHC region.
The UCSF team led by Pui-Yan Kwok, Ph.D., and Ernest T. Lam, Ph.D., and colleagues at BioNano Genomics describe the work in a paper titled “Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly.”
The mapping approach comprises four steps; sequence-specific labeling of the DNA, linearization of the labeled long DNA molecules, imaging, and map construction. The labeling technique uses a nicking endonuclease to introduce single-strand nicks in the double-stranded DNA at specific sequence motifs. Fluorescently-labeled nucleotides are then incorporated at the nicked sites, and the labeled DNA molecules are stained to visualize the whole molecule and measure its size.
The nanofluidic chip in which the DNA is prompted to stretch out for imaging contains three sets of nanochannels, each comprising 4,000 channels of 0.4 mm in length and 45 nm in diameter. At this diameter the long DNA molecule introduced into each channel is forced to remain in an elongated, linear state, as there’s no room for it to fold back on itself. Getting the DNA to uncoil in the first place is achieved by the physical interactions of the molecule with a region of pillars and wider channels that front the nanochannels. This region of confinement effectively forces the DNA to interact with the pillars and uncoil as the molecule flows into the nanochannel. The stretched out labeled DNA molecules are then imaged, and consensus sequence motif maps are constructed by comparing and clustering DNA molecules with the same sequence motif patterns.
The team applied the technology to generate motif maps of the MHC region for two haploid clone libraries from the MHC Haplotype Consortium collection, using 49 and 46 BAC clones from the PGF and COX libraries, respectively. The procedure involved mixing samples of all the clones for each library, extracting DNA, nick-labeling each mixture and dividing it into two aliquots. One aliquot from each mixture was linearized with one restriction enzyme, and the other with a different restriction enzyme, to generate four nick-labeled, linearized mixtures.
Each mixture was then loaded into the nanochannel array separately and imaged. The contiguous images were subsequently stitched together to produce a longer field of view. “In total, we collected images of 23,000 molecules corresponding to 3 Gb of DNA sequence,” the researchers write. “The size of each molecule ranged from 20–220 kb, with a large fraction of the molecules >100 kb.”
To simulate a dataset obtained from a diploid DNA sample, the researchers combined image data from all four mixtures before analysis. Distances between each label for all the molecules were calculated and unsupervised clustering analysis carried out. These clusters were then used to generate consensus sequence motif maps for the individual BAC clones, and maps of overlapping BACs joined to produce contig maps. This generated three contigs across the 4.7 Mb MHC region, and allowed the team to highlight regions harboring differences between the two haploid genomes. The results were validated by comparing the data with analyses of the haploid PGF and COX datasets.
Interestingly, while the maps produced using the BioNano Genomics technology matched perfectly with the PGF reference map, there were discrepancies relative to the COX reference map, and subsequent sequencing of the clones determined that there was, in fact, a 4 kb error in the COX reference. Overally, the accuracy of nanochannel genome mapping technology allowed the detection of single nucleotide variants, duplications, and relatively small insertions and deletions, including a 5 kb insertion and a 30 kb tandem duplication. Notably, the MHC haplotype maps accurately differentiated the two HLA-DRB1 variants (DRB 150101 and 030101) within the coding region, something that next-generation sequencing struggles to achieve because the gene is relatively long and harbors large introns in a highly repetitive region, the team notes.
The researchers in addition demonstrated how genome maps generated using the Irys platform can be used to aid de novo sequence assembly following next-generation sequencing of libraries prepared from the PGF and COX clones. Contigs assembled from the sequencing reads were aligned to the map generated using the genome mapping technology, to provide detailed information on the relationship and orientation of contigs, together with the location and size of each gap between them. Importantly, the authors state, using the genome assembly scaffolds improved assembly contiguity, while retaining haplotype and structural information.
“The BioNano approach to genome mapping achieves uniform DNA stretching in a high-throughput format, allowing researchers to directly view genome variation in the full biological context,” states Erik Holmlin, Ph.D., the firm’s CEO. “This paper demonstrates how the Irys system provides a completely new data type that opens the door to more accurate and comprehensive structural variation discovery studies and improves our ability to achieve high-quality sequence assemblies.”