December 1, 2013 (Vol. 33, No. 21)
Constantin Polychronakos, M.D. Professor of Pediatrics and Human Genetics & Editor McGill University, Journal of Medical Genetics
The post-genomic era has already completed its first 10 years, a decade that saw drastic advances in our understanding of the genetic factors that determine living organisms. The emphasis, of course, has been on the determinants of how the human body is constructed and functions, how individuals differ from each other, and how such differences determine health.
This is a good time to sit back and reflect on what has been achieved, where the possible limits of what can be achieved are, and what the challenges for the next 10 years are.
In terms of information content, the more than three billion nucleotides of the human genome sequence are the equivalent of 400 thick encyclopaedia volumes—written in a foreign language that we are just beginning to decipher and learn. The genetic code has been known for decades, but the protein-coding sequences, where its relevance is confined, represent barely more than 1% of the total.
To create a full human body out of a single cell (and to keep it functioning for many decades) these protein-coding units need to be regulated with exquisite tissue-specificity, timing precision, and physiological responsiveness. Much of the genomic effort in the past decade has zeroed in on understanding the relevance of the noncoding 99% of the genome in these processes.
Crucial functional aspects of junk DNA have been the focus of genome-wide computational and functional work that, in the past decade, has gone far beyond the “promoter-bashing” and transcription-factor sequence definition of previous decades. Evolutionary conservation, identification of sequence patterns linked to specific functions, and other computational approaches prioritize candidates for functional exploration.
Massively parallel sequencing (MPS) is now being used to examine immunoprecipitates for interaction of DNA with specific protein regulators at a genome-wide scale (ChIP-seq), leaving locus-specific or array-based (ChIP-chip) approaches behind. Conformation-capture technology has begun to reveal the intricate ways in which DNA folds upon itself and its protein interactors.
RNA-Seq has not only replaced arrays for the transcriptome-wide assessment of gene regulation but, perhaps more importantly, has surprised us by revealing a wealth of transcriptional activity from much of the noncoding genome.
The first post-genomic decade has also seen an explosion in knowledge and understanding of the molecular markings that make DNA function differently in various tissues and maintain the programming effect of environmental influences long after they have ceased to exist. Study of DNA methylation by specific arrays or MPS after bisulfide treatment has been only the beginning.
With genome-wide methods like ChIP-Seq or DNAse protection, we can now minutely define the control of chromatin organization and gene expression by elaborate modification of individual amino acid residues in histones. This cell-type specific epigenetic landscape, superimposed on transcriptional profiling from the same cells, is beginning to reveal details of the very basic processes that create tissues and determine their function.
The importance of the above notwithstanding, the breakthroughs of most relevance to human disease in the past decade came from combining complete knowledge of the genome with powerful new technologies targeting human variation.
In 2003 it was already known that, at the single-nucleotide level, human genetic diversity corresponded to a difference of approximately one nucleotide per kb between any two unrelated individuals. It was, then, taken for granted that the vast phenotypic diversity within humans, including but not limited to disease susceptibility, could be explained by this 0.1% of the genome.
Against this background, researchers were in for a great surprise that came with the development of array-based comparative genomic hybridization (CGH). Starting with BAC arrays, which rapidly gave way to the much higher-resolution oligo-based ones, extensive work has made it obvious that the genome of healthy individuals contains megabase-sized deletions and/or duplications, sometimes encompassing entire genes or even families of genes.
Thought, until then, to be rare events causing esoteric syndromes with unpronounceable names, large copy number changes were found to be common enough in the general population to constitute a substantial contributor to genotypic and phenotypic diversity.
Another surprise came as a result of the ability to deeply probe human variation at the single-nucleotide level. Two key advances, a conceptual and a technical one, led to it.
First, the genome-wide mapping of human linkage disequilibrium (LD) by the HapMap project precisely defined how variants adjacent on each chromosomal region can give information about each other, allowing inference about the effects of genomic variation by experimentally genotyping only a small subset of the variants involved.
But even this reduced subset entailed a number of SNPs still intractable by conventional methods. High-density, genome-wide SNP arrays solved this problem by enabling the genotyping of from hundreds of thousands to well over a million polymorphisms in many thousands of individuals.
Healthy control subjects were compared to patients with an increasingly large variety of diseases with detectable familial clustering but whose inheritance pattern suggested dependence on many different genetic loci. Most morbidity and mortality in contemporary humans can be ascribed to such diseases.
At the onset of such studies, it was anticipated that association with a dozen or so loci conferring a relative risk (RR) of substantial magnitude (e.g., two- or threefold) would explain most (e.g., >80%) of the heritability of each disease. It was a disappointment to see that the RR conferred by the vast majority of the variants was rather modest, in the 1.1 to 1.3 range.
Larger and larger sample sizes were deployed in these studies, to discover effects of smaller and smaller magnitude that still explain, in most cases, considerably less than half of the disease heritability. Sophisticated analysis suggests that thousands of increasingly weaker effects account for the heritability of most polygenic traits.
This, however, does not mean that their knowledge is not useful. Effects can be weak because the gene is not important in the disease process but it can also be due to very weak effects or very low population frequency of the causal allele on a crucial gene in the process and an excellent drug target.
Exomes and Mendelian Diseases
Yet another unexpected finding that revealed by applying the power of MPS on a large number of individual DNA samples (either genome-wide or after whole-exome capture): healthy individuals carry many mutations that are clearly gene-destroying.
In retrospect, this should not have come as a big surprise: the number of known monogenic diseases inherited recessively is currently in the thousands and more are being discovered with every monthly issue of the genetics journals, making it inevitable that every one of us is an asymptomatic carrier for a number of them.
The gene mutated in many of these diseases had already been known prior to the availability of the genome sequence, through “positional cloning.” This process, however, requires a certain number of related individuals with the disease phenotype, something that was not practical for the very rare diseases whose molecular basis is now being discovered by the use of exome sequencing. Exome sequencing is also rapidly becoming the obvious diagnostic test for a large number of cases suffering from diseases with highly heterogeneous molecular etiology or presenting with an atypical clinical picture.
What has been the impact of all these findings on improving human health? Not much, the scientists involved freely admit and this is not a condemnation of their importance. The lag time between a fundamental discovery and its practical application can be measured in decades and every big medical breakthrough we hear about in the media is invariably based on 20 or 30 years’ worth of more fundamental previous work.
For-profit investment in discovery genomics, quite substantial at the beginning of the decade, has largely dried up following these realizations. For the time being, consumer genomics, an industry addressing human vanity a lot more than it does health promotion, is the one obvious money-making application of genomic discoveries.
After these achievements, surprises and disappointments, what can we expect for the next 10 years? Fine-mapping and functional evaluation of the known complex-trait loci is definitely on the agenda and currently pursued with intensity.
Superimposing the genetic findings on the epigenomics and transcriptomics of the tissues and cell-types most important in each disease promises advanced insights. How these will be translated into diagnostics and therapeutics remains a big question mark.
Constantin Polychronakos, M.D. (firstname.lastname@example.org), is a professor in the departments of pediatrics and human genetics at the McGill University Health Center in Montreal. He is also editor of the Journal of Medical Genetics.