The discovery that 20,000–24,000 protein-encoding genes exist in the haploid human genome (a key finding of the Human Genome Project) fueled initiatives to map gene interactions and characterize the genetic circuitry in cells.
Radiation hybrid mapping, a strategy in which high-dose X-rays randomly introduce chromosomal breaks that shatter the DNA into tiny fragments, was initially used to generate high-resolution physical maps of the human genome.
The strength of this strategy is its ability to identify genetic markers that are close to each other. The closer two genetic markers are to each other on the chromosome, the more likely it is that they will be located on the same chromosomal fragment. Moreover, the frequency of breakage between markers can be used to reveal their order on the chromosome.
“We realized that we could ask a different question from the one that radiation panels had previously explored,” says Desmond J. Smith, M.D., Ph.D., professor of molecular and medical pharmacology at University of California, Los Angeles. Dr. Smith and colleagues proposed that radiation hybrid mapping data from mammalian cells could be used to delineate genetic survival networks for proliferation.
Central to this endeavor was the concept that if an extra copy of a gene may lead to cell death, this toxic effect could be blocked by an additional copy of another, distant gene. “And if that was true, we could expect to see the two genes co-inherited more often than expected by chance in the panel of radiation hybrid cells,” says Dr. Smith.
By looking at all potential pair-wise interactions between all the genes from the genome, investigators in Dr. Smith’s lab delineated an unbiased network of interactions involved in cell proliferation and survival, and subsequently applied this knowledge to address a question relevant to genetic circuits that underlie malignancies. Some cancers have a survival advantage as a result of copy number changes, such as the amplification of genes that increase cell proliferation and the deletion of genes that block proliferation.
“Most investigators examined these copy number variations in isolation,” says Dr. Smith. The question that investigators in Dr. Smith’s lab addressed extended beyond the simple characterization of isolated copy number variations (CNVs) in cancer. “We wanted to know whether the amplification of a gene affected in cancer is accompanied by the amplification of a distant gene somewhere else in the genome,” says Dr. Smith.
This strategy helped characterize the survival network of cancer cells, which is a subset of the survival network that is unveiled by the radiation hybrid data. In addition, this strategy bypassed one of the most significant challenges in characterizing copy number changes, the frequent involvement of multiple genes, which historically made it difficult to point toward the specific genes contributing to the resulting phenotypes.
“This is not the case for radiation mapping hybrid panels, where X-rays fragment the DNA and the resolution is very high,” says Dr. Smith. Overlapping the cancer interaction network with the radiation hybrid network provides opportunities to better understand, at single-gene resolution, the involvement of specific genes in disease. “We hope that these networks can ultimately be exploited for cancer treatment,” says Dr. Smith.
Detecting CNVs via NGS
In recent years, array comparative genomic hybridization (CGH) was recommended by several professional societies as a first-line test for the prenatal detection of CNV. “But this approach suffers of low resolution, and … it misses balanced chromosomal structural variants, such as translocations and inversions,” says Yu-Ping Wang, Ph.D., associate professor of biomedical engineering at Tulane University.
With the emergence of next-generation sequencing, investigators in Dr. Wang’s group turned their attention toward this approach as a way to improve the accuracy of CNV detection. “Next-generation platforms can detect CNVs with a higher resolution, which is unattainable with other approaches, such as array CGH,” says Dr. Wang. By using next-generation sequencing data based on a total-variation-penalized, least-squares model, the first time this statistical approach was used to analyze CNVs, Dr. Wang and colleagues developed CNV-TV (total variation). “We found this tool to provide higher accuracy and robustness for CNV detection,” says Dr. Wang.
Several depth-of-coverage, next-generation sequencing strategies are currently available to detect CNVs. In a comprehensive survey comparing six publicly available platforms, Dr. Wang and colleagues revealed that each of them presents specific strengths and weaknesses. In addition, some are superior to others for specific applications, underscoring the need to integrate multiple approaches to more robustly capture genetic variation.
While most approaches focus on detecting CNVs from individual samples, or by comparing CNVs from patients with disease with those from controls, Dr. Wang and colleagues applied non-negative matrix factorization to detect recurrent CNVs within a population, and showed that two ethnic groups can be distinguished based on differences in their CNV pattern clustering. “Technically there is still room to further improve our ability to detect CNVs, and we are currently developing a strategy to detect CNVs from multiple samples,” says Dr. Wang.
CNVs as Evolutionary Clues
“Structural variants in the human genome have mostly been studied for their clinical implications, but not a lot has been done to understand the evolutionary implications of these genomic segments,” says Charles Lee, Ph.D., scientific director of the newly created Jackson Laboratory for Genomic Medicine.
Work in Dr. Lee’s lab showed that the copy number of the amylase-encoding gene, AMY1, is under positive evolutionary selection. Fewer copies were found in populations consuming little starch, but the copy number increased to as much as 15 per cell in individuals from populations having a high intake of this carbohydrate. In contrast to humans, only two copies of the AMY1 gene were found in cells from the chimpanzee, a species that consumes very little starch.
More recent work in Dr. Lee’s lab, at the interface between structural chromosomal variants and evolution, identified an approximately 36 kB noncoding locus in the human genome containing transcribed and putatively regulatory sequences, which exhibits a copy number variant that is thought to predate the divergence, over 500,000 years ago, of the modern human and Neanderthal lineages.
While most CNVs, like single-nucleotide polymorphisms (SNPs), appear to be bi-allelic in a given population, CNVs have the potential of “mutating” faster, making their analysis more challenging. “They are also often embedded in complicated regions of the human genome, enriched for repetitive DNA and other genomic rearrangements, making them difficult to accurately genotype and thereby hindering their incorporation in most genetic studies,” says Dr. Lee.
While array CGH and next-generation sequencing continue to help unveil and characterize structural variants, insights into their evolution and mutation rates are accompanied by technical hurdles. “To understand how they arose, we need to accurately characterize the boundaries and content of each of these structural genomic variants at the nucleotide sequence level. We’ve gotten better at this over the years, especially for deletions, but we still have a ways to go for other structural variant types,” says Dr. Lee.