The technological landscape has been dominated during the past 50 years or so by electronics in various forms: digital computing, wired and wireless communication, miniaturization of components, and the like, all at costs that decrease according to Moore’s law: a reduction by 50% every 18 months or so.
But now it is widely believed that the next 50 years will belong to biology. The publication of the draft human genome in 2001 captured the imagination of the public, and there was great anticipation that post-genomic biology would be radically different from what went on earlier and lead to rapid advances in diagnostics and therapy.
In reality much of that promise still remains just that—promise. However, the dramatic reductions in cost that were earlier witnessed in the world of electronics are now manifesting themselves in the world of biology. Whereas the human genome project took about 10 years and cost around $3.5 billion to generate a rough draft that was only about 98% accurate, it is now possible to sequence individual human genomes at far higher accuracies for less than $5,000 each.
If one does not insist on sequencing an entire genome, but focuses on detecting mutations at specific locations in the DNA, the cost is even lower. This has encouraged the re-sequencing of a great many diseased tissues, especially in cancer.
For example, TCGA (The Cancer Genome Atlas) is an ambitious project to achieve comprehensive molecular characterization of every single cancerous tissue that is currently preserved, including the exome sequence, DNA copy number, promoter methylation, as well as expression analysis of messenger RNA and microRNA. The project has already resulted in a massive amount of information that can be used by the research community to fine-tune their diagnostic and therapeutic tools.
All of these advances have resulted in a subtle shift in the balance between data generation and data analysis. In earlier years biology was primarily viewed as an experimental science. However, nowadays biology is just as much a computational science as an experimental science. In other words, data must be turned into information, and information into actionable knowledge and experimentally testable hypotheses.
Data analysis is a natural activity for the engineering community and affords an opportunity for engineers to work hand in hand with biologists to develop new insights into disease mechanisms, to identify biomarkers that can predict which patients will respond to which therapy, and to provide mechanistic (cause and effect) explanations as to why these are biomarkers.
Going forward, the landscape of cancer patients will resemble a mosaic consisting of groups that are highly coherent within themselves but substantially different across groups, and treatments will be customized to each coherent group. It would be appropriate to refer to this approach as targeted medicine, though this is often mislabeled as personalized medicine.
The analysis of massive datasets poses as much of a challenge to engineers as to biologists, because many of the currently popular methods in engineering will simply not work when applied to biological datasets. One important difference is that many engineering datasets are characterized by a very large number of samples and a far smaller number of features.