The first post-genomic decade has also seen an explosion in knowledge and understanding of the molecular markings that make DNA function differently in various tissues and maintain the programming effect of environmental influences long after they have ceased to exist. Study of DNA methylation by specific arrays or MPS after bisulfide treatment has been only the beginning.
With genome-wide methods like ChIP-Seq or DNAse protection, we can now minutely define the control of chromatin organization and gene expression by elaborate modification of individual amino acid residues in histones. This cell-type specific epigenetic landscape, superimposed on transcriptional profiling from the same cells, is beginning to reveal details of the very basic processes that create tissues and determine their function.
The importance of the above notwithstanding, the breakthroughs of most relevance to human disease in the past decade came from combining complete knowledge of the genome with powerful new technologies targeting human variation.
In 2003 it was already known that, at the single-nucleotide level, human genetic diversity corresponded to a difference of approximately one nucleotide per kb between any two unrelated individuals. It was, then, taken for granted that the vast phenotypic diversity within humans, including but not limited to disease susceptibility, could be explained by this 0.1% of the genome.
Against this background, researchers were in for a great surprise that came with the development of array-based comparative genomic hybridization (CGH). Starting with BAC arrays, which rapidly gave way to the much higher-resolution oligo-based ones, extensive work has made it obvious that the genome of healthy individuals contains megabase-sized deletions and/or duplications, sometimes encompassing entire genes or even families of genes.
Thought, until then, to be rare events causing esoteric syndromes with unpronounceable names, large copy number changes were found to be common enough in the general population to constitute a substantial contributor to genotypic and phenotypic diversity.
Another surprise came as a result of the ability to deeply probe human variation at the single-nucleotide level. Two key advances, a conceptual and a technical one, led to it.
First, the genome-wide mapping of human linkage disequilibrium (LD) by the HapMap project precisely defined how variants adjacent on each chromosomal region can give information about each other, allowing inference about the effects of genomic variation by experimentally genotyping only a small subset of the variants involved.
But even this reduced subset entailed a number of SNPs still intractable by conventional methods. High-density, genome-wide SNP arrays solved this problem by enabling the genotyping of from hundreds of thousands to well over a million polymorphisms in many thousands of individuals.
Healthy control subjects were compared to patients with an increasingly large variety of diseases with detectable familial clustering but whose inheritance pattern suggested dependence on many different genetic loci. Most morbidity and mortality in contemporary humans can be ascribed to such diseases.
At the onset of such studies, it was anticipated that association with a dozen or so loci conferring a relative risk (RR) of substantial magnitude (e.g., two- or threefold) would explain most (e.g., >80%) of the heritability of each disease. It was a disappointment to see that the RR conferred by the vast majority of the variants was rather modest, in the 1.1 to 1.3 range.
Larger and larger sample sizes were deployed in these studies, to discover effects of smaller and smaller magnitude that still explain, in most cases, considerably less than half of the disease heritability. Sophisticated analysis suggests that thousands of increasingly weaker effects account for the heritability of most polygenic traits.
This, however, does not mean that their knowledge is not useful. Effects can be weak because the gene is not important in the disease process but it can also be due to very weak effects or very low population frequency of the causal allele on a crucial gene in the process and an excellent drug target.