The Human Genome Project, completed 50 years after Watson and Crick published their groundbreaking paper on the double helix, revealed single nucleotide polymorphisms (SNPs) to be the main source of inter-individual genetic variability. Thought to occur on average once every 300–1,000 base pairs, SNPs have been linked to disease susceptibility, response to therapeutic agents, and adverse drug reactions.
SNPs, however, are not the only major source of genetic variability; the increasing importance of copy number variations (CNVs) started to emerge around 2004. Several criteria such as size, genomic localization, type, and molecular mechanism exist to classify CNVs. The unveiling of this new class of genomic variation made it clear that human beings can not be considered, any longer, 99.9% genetically identical.
“CNVs have now been shown collectively to include more nucleotides than the total of single-nucleotide polymorphisms, so they need to be considered in any genetic design study,” says Stephen W. Scherer, Ph.D., director of the center for applied genomics at The Hospital for Sick Children in Toronto.
In a study, codirected with Matthew Hurles, Ph.D., from The Wellcome Trust Sanger Institute and other international collaborators, Dr. Scherer’s team generated a comprehensive copy number variation map that unveiled 1,447 copy number variable regions covering approximately 12% of the human genome.
Importantly, the same study, which enrolled 270 individuals from four populations, revealed that approximately 14.5% of the genes deposited into the OMIM Morbid Map, the comprehensive database linking human genes with phenotypes, overlap with copy number variations—predicting the fundamental role that CNVs will play in understanding human disease.
At CHI’s “Comprehending Copy Number Variation” meeting to be held later this month in San Diego, Dr. Scherer will present research that he and his collaborators have conducted on understanding how several forms of copy number and structural variation contribute to neuropsychiatric conditions. A major research effort in the Scherer group focuses on autism spectrum disorders, historically classified as idiopathic (~90%), when an underlying cause is not apparent, or secondary (5–10%), when single gene disorders, chromosomal abnormalities, or environmental factors emerge as the underlying cause.
High-resolution microarray analysis and karyotyping conducted in the Scherer lab revealed de novo copy number variations in approximately 7% of patients in whom the condition had previously been classified as idiopathic, with two or more of these rearrangements identified in 11% of them.
This approach also helped unveil genomic changes previously not known to be associated with autism spectrum disorders, such as a copy number variation at 16p11.2 identified in approximately 1% of the cohort with autism but not among their controls.
CNVs are becoming increasingly valuable in the clinical setting. Shelly R. Gunn, M.D., Ph.D., medical director at Combimatrix Molecular Diagnostics, examines somatic and germline copy number changes by using array-based comparative genomic hybridization (CGH).
Deciding whether a particular CNV causes a specific condition is, however, more challenging than it appears. “CNVs are sprinkled throughout everybody’s genome,” says Dr. Gunn. “Every time I see a genome, I have to decide, am I looking at a CNV that is pathogenic, one that is causing the phenotype, or am I looking at a CNV that was inherited and is benign?”
Many changes are not associated with disease, and databases where investigators deposit genomic variations that they suspect do not contribute to disease represent a valuable resource in this respect. “Most times, if there is a CNV less than 1 megabase, then we find it in the parents, we add it to the databases, and this will become one additional change that is not associated with disease and will be present for future reference,” explains Dr. Gunn.
On the other hand, CNVs that are three megabases or larger rarely represent inherited benign changes, and most of them are associated with diseases—but, very importantly, as Dr. Gunn points out, “it is difficult to predict, based on size, whether a duplication or deletion is benign or not,” and a 500 kb microdeletion on chromosome 16p11.2, which is associated with autism despite its relatively small size, provides, perhaps, one of the most relevant examples to illustrate this concept.
“Array-based CGH is an excellent test for breast cancer,” explains Dr. Gunn. Ever since the human genome was sequenced, the importance of examining copy number changes inside a tumor has been known, but this was not possible with traditional cytogenetic approaches. With CGH assays, exploring tumor genomes has now become reality and, as Dr. Gunn reveals, “this is probably the most exciting thing about copy number changes—they tell you so much about the tumor genome and about how the tumor is going to behave—and now we can look at it.”
Dr. Gunn’s presentation at “Comprehending Copy Number Variation” will focus on HER2 copy number changes in breast cancer and illustrate the advantages that array-based CGH offers over other techniques such as immunohistochemistry and fluorescence in situ hybridization.
Paul Dear, Ph.D., leader of the single molecule genomics group at the MRC Laboratory of Molecular Biology, will talk about a PCR-based genome analysis methodology that his group developed—molecular copy-number counting (MCC)—and its most recent modification—microdissection molecular copy-number counting (µMCC).
MCC relies on directly counting target DNA sequences in a series of genomic samples at limiting dilution. In addition to not involving hybridization steps, its effectively unlimited resolution and the requirement for small amounts of genomic DNA are remarkable, Dr. Dear reports. µMCC offers yet additional advantages such as applicability to formaldehyde-fixed and embedded biopsy samples and a wide dynamic range.
“We are very interested in CNVs that arise during the early development of cancer, or even before a lesion becomes cancerous,” says Dr. Dear. For a long time, a several megabase-long duplication on the long arm of chromosome 3 has been known to appear early during lung cancer development and is thought to harbor genes driving it.
While many approaches pose challenges when examining CNVs from biopsies, particularly from small or damaged samples, MCC and µMCC can provide actual copy number measurements for as few as 60 cells, or samples originating from 10–20 year-old biopsies, he adds.
“We can look at a small number of cells, even in old and degraded samples such as those in clinical archives,” explains Dr. Dear, and the benefits extend not only to early cancers that harbor small numbers of transformed cells, but also to well-developed tumors that are genomically heterogenous; small samples from various places within the tumor must be examined to fully understand cancer genomics.
Single-molecule genomics approaches are informative of whether precancerous lesions will progress to cancer, and they profoundly impact cancer diagnosis. “Our observation is that we can use the analysis of copy number variation to characterize lesions more precisely than by histology, and also predict which lesions will progress,” notes Dr. Dear.
Exploring Copy Number Variations
Several approaches are currently available to explore copy number variations, and understanding their limitations is essential. Real-time PCR, for example, is user-friendly, has a very dynamic range, requires small sample sizes, and provides extremely good resolution when discriminating 50% or 25% differences in copy numbers but, beyond that point, technical considerations and biological errors often affect resolution.
Fluidigm’s integrated fluidic circuit system is a platform to examine gene- and sequence-specific copy number variations. “With our digital array platform, because we are looking at end-point images, there is literally no limit to the difference in copy numbers that we can detect,” reveals Ramesh Ramakrishnan, Ph.D., director of molecular biology.
This nanofluidic platform enables the isolation and amplification of single DNA molecules, a technique known as digital PCR, and accurate DNA quantitation is based on the random partitioning of DNA molecules into an array that can have more than 9,000 chambers or wells.
Individual samples are subsequently PCR amplified, and the initial concentration of a specific sequence is calculated based on the number of positive chambers that contain at least one copy of the desired specific sequence, using an algorithm relying on probability theory and statistics.
The output, in the form of “relative copy number,” represents the ratio between the copy number of the gene of interest and that of a reference gene, which is always 1. The relative copy number is always 1 for single copy genes, higher than 1 for duplications, and values lower than 1 are indicative of deletions.
The nanofluidic platform provides a stronger discrimination power as compared to quantitative PCR, explains Dr. Ramakrishnan as exemplified by its ability to differentiate between six and seven copies of a target gene, which corresponds to as little as a 15% difference in gene copy number and, importantly, can be tailored to any gene or sequence.
An increasing number of conditions affecting various organs and systems are impacted by copy number variations. Chack-Yung Yu, D.Phil., professor of pediatrics, molecular virology, immunology and medical genetics at the Ohio State University, is using Southern blot analyses and TaqMan-based real-time PCR assays to examine copy number variations for complement C4, a key immune effector protein of the classical complement activation pathway.
“Microarrays tell us that there is a copy number variation,” says Dr. Yu, “but they don’t give us the actual or detailed description, especially when it is continuous variation. We know that there are many common CNVs, and microarrays give us an idea of where they are, but a lot of work needs to be done to figure out what genes are involved, how they change, and which genes are functional and which are non-functional.”
Different human populations harbor different numbers of C4 gene copies on chromosome 6. A quantitative correlation exists between the gene copy number and plasma C4 protein expression levels, with lower protein levels in individuals harboring fewer gene copies and more abundant protein levels in those with more copies.
Additionally, qualitative diversity provides an additional level of complexity, such that, when multiple copies of C4 genes are present, there is a chance that they will undergo mutations or polymorphisms, and despite a 99.9% identity, subtle variations caused by sometimes as few as two to three amino acid changes in various places can provide new functions for the protein.
“The gene products also tend to vary subtly; they will be picked up by the same polyclonal antibodies, but when you look at the sequences of the genes, you will see differences,” says Dr. Yu.
For the C4 protein, two classes have been described, the acidic C4A and the basic C4B, with only four differences at the amino acid level defining each class, but functionally they generate different proteins. The Yu group recently demonstrated that C4A gene copy number variations are associated with susceptibility to systemic lupus erythematosus among European Americans.
The situation gets even more complex. In a phenomenon known as segmental variation, often genes do not vary alone in their copy numbers, but neighboring genes are affected as well. Duplications of the C4 genes are always associated with duplications of the RP (or STK19) gene at the 5´ end, encoding a serine/threonine nuclear protein kinase, and of the CYP21 and TNX genes, at the 3´ end, encoding cytochrome P450 21-hydroxylase and the extracellular matrix protein tenascin, respectively.
Whenever such segmental duplications occur, some of the duplicated genes are always functional, but other genes acquire mutations, turn into pseudogenes, and may be recombined into the progeny during the meiotic crossover.
“It is one way to generate diversity,” explains Dr. Yu, “but it is also a way to get into trouble, because functional genes acquire mutations.” One of the genes in the immediate vicinity of C4, CYP21, encodes an essential enzyme involved in the biosynthesis of cortisone and mineralocorticoids, and its deficiency results in excessive male sexual hormone and congenital adrenal hyperplasia.
Mapping
Jan Korbel, Ph.D., group leader at EMBL Heidelberg, will talk about recent progress his group has made in using high-resolution and massive paired-end mapping, in which 3 kb fragments of paired ends are prepared and sequenced with next-generation approaches, to generate high-resolution copy number variation maps.
These maps are highly informative of the potential functional impact of genomic variations, and they can provide insights into the recent history of how copy number variations evolved in various genomes.
Research into genetic variation with effects on olfactory physiology is just one of the research interests in the Korbel lab, partly as a collaboration with Doron Lancet, Ph.D., from the Weizmann Institute of Science. “The numbers and types of olfactory receptor genes seem to be much more different between individuals than people previously thought,” reveals Dr. Korbel.
His work identified a large number of olfactory receptor gene deletions in some individuals, with some parts consistently missing. Another interesting finding, that olfactory receptor genes sometimes fuse to generate a new gene, represents an interesting phenomenon from an evolutionary perspective. “It is becoming obvious now that copy number variation is just a normal process that leads to variation between individuals, but on a larger evolutionary timescale, it can lead to novel gene functions,” says Dr. Korbel.
The science of human copy number variations, while still in its infancy, has already demonstrated its profound impact on our ability to understand physiological processes, disease pathogenesis, and evolutionary concepts. When thinking about copy number variations, it is perhaps most relevant to remember microbial organisms, which, through their relative simplicity and amenability for investigation, provided so many of the key scientific concepts.
For many microbial species, variations in tandem repeat copy numbers represent an important aspect of their existence, as they shape virulence and contribute to antigenic variation and to the ability to escape immune surveillance.
In all likelihood, copy number variations hold the answers to some of the most fascinating questions in life sciences, and their importance across species is reminiscent of Aristotle’s words, dating back two millennia yet so very relevant today: “In all things of nature there is something of the marvelous.”