The Human Genome Project, completed 50 years after Watson and Crick published their groundbreaking paper on the double helix, revealed single nucleotide polymorphisms (SNPs) to be the main source of inter-individual genetic variability. Thought to occur on average once every 300–1,000 base pairs, SNPs have been linked to disease susceptibility, response to therapeutic agents, and adverse drug reactions.
SNPs, however, are not the only major source of genetic variability; the increasing importance of copy number variations (CNVs) started to emerge around 2004. Several criteria such as size, genomic localization, type, and molecular mechanism exist to classify CNVs. The unveiling of this new class of genomic variation made it clear that human beings can not be considered, any longer, 99.9% genetically identical.
“CNVs have now been shown collectively to include more nucleotides than the total of single-nucleotide polymorphisms, so they need to be considered in any genetic design study,” says Stephen W. Scherer, Ph.D., director of the center for applied genomics at The Hospital for Sick Children in Toronto.
In a study, codirected with Matthew Hurles, Ph.D., from The Wellcome Trust Sanger Institute and other international collaborators, Dr. Scherer’s team generated a comprehensive copy number variation map that unveiled 1,447 copy number variable regions covering approximately 12% of the human genome.
Importantly, the same study, which enrolled 270 individuals from four populations, revealed that approximately 14.5% of the genes deposited into the OMIM Morbid Map, the comprehensive database linking human genes with phenotypes, overlap with copy number variations—predicting the fundamental role that CNVs will play in understanding human disease.
At CHI’s “Comprehending Copy Number Variation” meeting to be held later this month in San Diego, Dr. Scherer will present research that he and his collaborators have conducted on understanding how several forms of copy number and structural variation contribute to neuropsychiatric conditions. A major research effort in the Scherer group focuses on autism spectrum disorders, historically classified as idiopathic (~90%), when an underlying cause is not apparent, or secondary (5–10%), when single gene disorders, chromosomal abnormalities, or environmental factors emerge as the underlying cause.
High-resolution microarray analysis and karyotyping conducted in the Scherer lab revealed de novo copy number variations in approximately 7% of patients in whom the condition had previously been classified as idiopathic, with two or more of these rearrangements identified in 11% of them.
This approach also helped unveil genomic changes previously not known to be associated with autism spectrum disorders, such as a copy number variation at 16p11.2 identified in approximately 1% of the cohort with autism but not among their controls.
CNVs are becoming increasingly valuable in the clinical setting. Shelly R. Gunn, M.D., Ph.D., medical director at Combimatrix Molecular Diagnostics, examines somatic and germline copy number changes by using array-based comparative genomic hybridization (CGH).
Deciding whether a particular CNV causes a specific condition is, however, more challenging than it appears. “CNVs are sprinkled throughout everybody’s genome,” says Dr. Gunn. “Every time I see a genome, I have to decide, am I looking at a CNV that is pathogenic, one that is causing the phenotype, or am I looking at a CNV that was inherited and is benign?”
Many changes are not associated with disease, and databases where investigators deposit genomic variations that they suspect do not contribute to disease represent a valuable resource in this respect. “Most times, if there is a CNV less than 1 megabase, then we find it in the parents, we add it to the databases, and this will become one additional change that is not associated with disease and will be present for future reference,” explains Dr. Gunn.
On the other hand, CNVs that are three megabases or larger rarely represent inherited benign changes, and most of them are associated with diseases—but, very importantly, as Dr. Gunn points out, “it is difficult to predict, based on size, whether a duplication or deletion is benign or not,” and a 500 kb microdeletion on chromosome 16p11.2, which is associated with autism despite its relatively small size, provides, perhaps, one of the most relevant examples to illustrate this concept.
“Array-based CGH is an excellent test for breast cancer,” explains Dr. Gunn. Ever since the human genome was sequenced, the importance of examining copy number changes inside a tumor has been known, but this was not possible with traditional cytogenetic approaches. With CGH assays, exploring tumor genomes has now become reality and, as Dr. Gunn reveals, “this is probably the most exciting thing about copy number changes—they tell you so much about the tumor genome and about how the tumor is going to behave—and now we can look at it.”
Dr. Gunn’s presentation at “Comprehending Copy Number Variation” will focus on HER2 copy number changes in breast cancer and illustrate the advantages that array-based CGH offers over other techniques such as immunohistochemistry and fluorescence in situ hybridization.
Paul Dear, Ph.D., leader of the single molecule genomics group at the MRC Laboratory of Molecular Biology, will talk about a PCR-based genome analysis methodology that his group developed—molecular copy-number counting (MCC)—and its most recent modification—microdissection molecular copy-number counting (µMCC).
MCC relies on directly counting target DNA sequences in a series of genomic samples at limiting dilution. In addition to not involving hybridization steps, its effectively unlimited resolution and the requirement for small amounts of genomic DNA are remarkable, Dr. Dear reports. µMCC offers yet additional advantages such as applicability to formaldehyde-fixed and embedded biopsy samples and a wide dynamic range.
“We are very interested in CNVs that arise during the early development of cancer, or even before a lesion becomes cancerous,” says Dr. Dear. For a long time, a several megabase-long duplication on the long arm of chromosome 3 has been known to appear early during lung cancer development and is thought to harbor genes driving it.
While many approaches pose challenges when examining CNVs from biopsies, particularly from small or damaged samples, MCC and µMCC can provide actual copy number measurements for as few as 60 cells, or samples originating from 10–20 year-old biopsies, he adds.
“We can look at a small number of cells, even in old and degraded samples such as those in clinical archives,” explains Dr. Dear, and the benefits extend not only to early cancers that harbor small numbers of transformed cells, but also to well-developed tumors that are genomically heterogenous; small samples from various places within the tumor must be examined to fully understand cancer genomics.
Single-molecule genomics approaches are informative of whether precancerous lesions will progress to cancer, and they profoundly impact cancer diagnosis. “Our observation is that we can use the analysis of copy number variation to characterize lesions more precisely than by histology, and also predict which lesions will progress,” notes Dr. Dear.