October 1, 2015 (Vol. 35, No. 17)
Richard A. A. Stein M.D., Ph.D.
Technological Advances are Increasingly Unveiling their Relevance for Human Biology
Collectively, copy number variants span more genomic material than all single nucleotide polymorphisms, and technological advances are increasingly unveiling their relevance for human biology, including disease predisposition and therapeutic and prognostic decision-making.
Copy number variants are not only suggesting new directions for our clinical future, they are also filling in details of our evolutionary past. Such variants are providing points of reference in the temporal maps being drawn in comparative genomics projects. For example, the structural variants known as deletions are helping scientists compare the human genome with the genomes of our evolutionary ancestors.
“We wanted to go back more than a million years, and look at human gene deletion variations in Neanderthal ancestors to understand whether these events have been under some sort of selection during evolution,” says Omer Gokcumen, Ph.D., assistant professor of biology at State University of New York at Buffalo. Advances in sequencing technologies and the availability of fossil records provided opportunities to examine the genetic makeup of extinct species. With higher quality ancient genomes, it is now possible to study different types of variations, including structural variations.
Using human genome data from the 1000 Genome Project and high-depth coverage sequences of the Neanderthal and Denisovan genomes, Dr. Gokcumen and colleagues recently identified 427 genomic deletions that are shared between humans and the two archaic hominin genomes.
Neanderthals, which are thought to have come into contact with modern humans approximately 80,000 years ago, appear to have survived until about 35,000 years ago in some regions of Europe. Denisovans, whose remains were first discovered in the Altai Mountains of Russia, share a common ancestor with Neanderthals, from which the two diverged about 500,000 years ago. Genetically, Neanderthals and Denisovans are more closely related to each other than either is related to modern humans.
Of the variable human deletions that Dr. Gokcumen and colleagues found to be shared with these ancient genomes, about 87% predated the divergence of humans and Neanderthals from chimpanzees, and about 9% of them were transmitted directly from Neanderthals. Only 17 of the chromosomal deletions were found to overlap with coding parts of the genome, pointing toward an excess of deletions that occurred in the intragenic regions.
“A small number of these genic deletions are related to drug metabolism and especially to immune function,” notes Dr. Gokcumen. “We think that there is a balance between autoimmunity and the reaction to ecological factors, such as UV light and defense against pathogens.”
Genes affected by functional deletions include SPATA45, which encodes a protein involved in spermatogenesis; the UGT2B genes, which control the metabolism of certain hormones and steroids; LCE3C, which controls susceptibility to psoriasis; and DMBT1, which has been linked to susceptibility to Crohn’s disease.
Comparative genomics also revealed that a large number of segmental duplications emerged in the great apes, and many of them appear to be related to brain evolution in humans. This increase in the emergence of segmental duplications occurred, evolutionarily, during a time when single base mutations were becoming less numerous.
“It appears that this birth of segmental duplications also created some sort of plasticity in the great ape genomes, which can lead to the emergence of large copy number variants,” observes Dr. Gokcumen. Duplicated sequences represent a source of genomic instability, due to possibility for crossover events during mitosis and meiosis.
This creation of additional copy number variants was hypothesized to lead to de novo mutations, duplications, and deletions, and to predispose to conditions that include autism spectrum disorders, schizophrenia, and several developmental disorders. In fact, over 30 regions that involve a large stretch of up to 10 Mb flanked by highly homologous segmental duplications were described in the human genome. These regions appear to be linked to various complex diseases.
One of the current difficulties is the identification and the accurate characterization of copy number variants in the genome. “The biggest technical challenge is that most of the time, deletions and duplications are larger than the sequence reads that we can obtain,” informs Dr. Gokcumen. As a result, genomic variation landscapes have to be mapped by relying on indirect predictions, an approach that is error-prone and may lead to false calls. “A large part of our research,” Dr. Gokcumen explains, “is directed to making sure that we are detecting real events.”
“For the longest time, we have naïvely believed that the phenotypes observed in disorders resulting from genomic copy number mutations, such as a deletion or duplication, can be explained by that single mutation, and that everybody who has that mutation develops a specific phenotype,” says Tamim H. Shaikh, Ph.D., associate professor of pediatrics at the University of Colorado School of Medicine. “However, the problem that we have always faced is that there is a large degree of phenotypic variability among people with the same deletion or duplication.”
For many years, much of Dr. Shaikh’s research has focused on the 22q11 microdeletion syndrome, also known as DiGeorge syndrome or the velocardiofacial syndrome. There is a great deal of phenotypic variability within patients with the 22q11 microdeletion. One such variable phenotype is characterized by congenital heart defects, which occur in about 65% of the affected individuals, but the reason for this phenotypic variability has remained unclear for a very long time.
“We hypothesized that copy number variants outside of the 22q11 deletion could shape phenotypic variability,” relates Dr. Shaikh. To gain better insight into the variability in the expression of congenital heart disease in patients with this condition, Dr. Shaikh and his colleagues, as part of a large international consortium, examined 22q11 deletion patients, with and without congenital heart defects.
This effort led to the identification of a common copy number variant that was significantly associated with the presence of congenital heart defects in the affected patients. The chromosomal change involved a duplication of SLC2A3, previously known as GLUT3, a gene that encodes a facilitated glucose transporter. The protein encoded by this gene represents the main glucose transporter responsible for the transplacental glucose transport and is involved in human cardiac development.
“The amplification of some oncogenes has been used for clinical diagnosis,” says Cristina Montagna, Ph.D., associate professor of genetics at Albert Einstein College of Medicine. “Several other candidates are being investigated as potential biomarkers.” Several years ago, Dr. Montanga and colleagues identified the link between the SEPT9 oncogene and breast cancer tumorigenesis, and showed that this gene is amplified in the form of double-minute chromosomes, a type of structural alteration leading to copy number gains.
Just like typical chromosomes, double-minute chromosomes replicate in the nucleus. Double-minute chromosomes, however, have their particularities. For example, they are circular, they lack centromeres and telomeres, and they are only about a few million base pairs in size.
“Double-minute chromosomes circularize themselves and may become amplified to hundreds of copies,” explains Dr. Montagna. “This phenomenon is generally linked to amplification of strong oncogenes and therapeutic response.” A large body of literature supports the idea that in several types of malignancies, double-minute chromosomes are linked to resistance to chemotherapy.
Dr. Montagna and colleagues recently reported that high-grade breast carcinoma has a significant increase in the SEPT9 copy number as compared to the lower grade cancers. To gain additional insight into the involvement of Septin 9 in malignancy development, researchers from Dr. Montagna’s lab quantitatively analyzed the mRNAs of the seven Septin 9 isoforms and provided, for the first time, a comprehensive characterization of their differential expression in tumor tissue as compared to the peritumoral breast tissue.
A major challenge when detecting genomic amplification is selecting the appropriate molecular approaches, a decision that depends on multiple factors, including the size of the genomic rearrangement. “FISH is probably still the best tool to detect genomic amplifications, because it allows the analysis of hundreds of single cells from blood or tissue sections,” advises Dr. Montagna.
Two limitations of FISH are the a priori need to know the genomic regions that will be targeted, and its inability to detect consistently regions smaller than 50 kb. “The decrease in sequencing costs to allow single-cell sequencing on many cells, which is still too expensive for general screening, and the availability of reliable bioinformatics tools to analyze next-generation data, are at this time the two biggest needs in the field,” adds Dr. Montagna.
“Many of the complex structural variants are cryptic,” says Ryan E. Mills, Ph.D., assistant professor of human genetics at the University of Michigan Medical School. “They are invisible on chromosomal microarrays and karyotyping.” To help clarify matters, Dr. Mills’ group is focusing on characterizing complex chromosomal structural variants.
The accuracy of detecting chromosomal rearrangements is shaped, to a large extent, by the type of rearrangement and its complexity. Chromosomal deletions generally create two breakpoints and lead to the formation of a new genomic junction. However, some of the more complex chromosomal rearrangements are the result of three or more breakpoints and involve multistep structural rearrangements.
“Characterizing these combinations of structural variants is challenging because they can generate multiple signals that are hard to resolve,” notes Dr. Mills. Moreover, complex rearrangements may also lead to false calls.
For example, a genomic sequence that is duplicated and inserted at a nearby location could be mistakenly interpreted as a deletion by some algorithms. “We have known for a number of years that different algorithms and sequencing platforms call different structural variants,” states Dr. Mills. “Some of these differences are explained by these complex events.”
A new algorithm that Dr. Mills and colleagues developed promises to provide a more accurate picture to characterize complex genomic rearrangements. “Our algorithm is a top-down approach that attempts to rearrange the genome in different ways and then visualize all kinds of different structures,” explains Dr. Mills. “By using this randomized strategy, we are not locked in the expectation of a particular pattern.”
In parallel with the 1000 Genomes Project, Dr. Mills and colleagues have used this algorithm to examine some of these complex genomic rearrangements. Based on initial estimates, Dr. Mills’ team was able to reclassify almost 10% of a particular subtype of deletions that have initially been identified as being complex in nature.
“This is more of a problem than we originally thought it would be,” admits Dr. Mills. “There is a large initiative that is currently trying to characterize these types of events.” These efforts are guided by the next-generation sequencing technologies, and generating long reads is essential to help cover several breakpoints and resolve multiple junctions.
“We are focusing on clinical decision support systems for molecular oncology,” says Clifford Baron, Ph.D., vp and COO at CollabRx. “One of our products provides an analytical platform to determine the impact of copy number variants on therapeutic decisions in cancer.”
With advances in sequencing, more and more links have been established between copy number variants and phenotypes, including predisposition to disease, therapeutic options, response to medication, and prognostic outcomes. A major effort at CollabRx involves collecting information on preclinical and clinical literature and treatment guidelines.
“We then summarize the information and treatment strategies that can be inferred from the published data,” apprises Dr. Baron. While current efforts primarily focus on providing guidance during therapeutic decisions, CollabRx also offers some coverage of the diagnostic and prognostic information and anticipates moving more decisively in this direction.
To help obtain clinically actionable information for chromosomal mutational variants, such as mutations and polymorphisms, the ACMG has developed, over the past 12 to 18 months, a set of standards and guidelines to help categorize mutations and characterize them as being actionable or of uncertain significance. “For quantitative and continuously variable genetic traits, we need similar standards to the ones that have been worked out for discrete genetic traits such as mutations,” insists Smruti Vidwans, Ph.D., CSO at CollabRx.