What will be the influence of ENCODE [see "The Encyclopedia of DNA Elements (ENCODE) Consortium", below] on NGS efforts and what does it mean for exome sequencing in cancer research?
I am a fan of whole-genome sequencing in cancer research because it gives the whole picture, including sites now better defined by ENCODE. While exome sequencing of cancers is a cost-savings measure, I find it very short-sighted in terms of the complexity of cancer genomes that can really only be examined through sequencing whole genomes.
I find the ENCODE results phenomenally exciting because very early on in the sequencing sequencing of cancer genomes we came up with a way to annotate the genome based on what was known. We broke our annotation down into 4 tiers: Tier 1 corresponded to all the known genes; Tier 2 corresponded to the so-called known regulatory sequences—those that were highly conserved throughout evolutionary time, which invokes the notion that they are probably important for something; Tier 3, which was everything else that was not annotated, but did not fall into the repetitive category; and then Tier 4 was everything that was annotated as a repetitive element in the genome.
Now that we have approximately 1,000 whole-genome sequences from cancer patients—tumor and normal, so nearly 2,000 whole genome sequences in all—sequenced at our institute, we are excited about layering on the ENCODE data to the genome annotation to determine how it enhances our knowledge of what is in Tiers 2 and 3. Having sequenced that many whole genomes, we can already identify recurrent mutations in Tier 2 and 3 regions of the genome. However, we have had no context by which to interpret those recurrent mutations until now.
In cancer sequencing, recurrence is an important measure of whether a region might be involved in the development of the disease. Gene involvement can be interpreted rather easily, but for regions that have little annotation in terms of their function, interpretation is almost impossible. ENCODE just enriches our understanding of Tiers 2 and 3, reinforces how important those regions are in the genome in terms of the biology of the cell in which they occur, especially if it is a cancer cell, and gives us the ability to interpret our data across those regions much better than we have been able to in the past, and I think that should continue.
Hopefully, there will be additional ENCODE-like efforts that go on. These can now be done in individual labs, of course, because of the reduced cost and the genome coverage from NGS methods. It is tremendously exciting and gives us a better understanding of the genome overall, which will be important for medical applications as well.