October 1, 2011 (Vol. 31, No. 17)
Leroy Hood M.D., Ph.D. Dr.
A Personal View of Systems Biology and the Coming of “Big” Science
This is a truly remarkable time in the biological sciences. Biology now has the opportunity to effectively attack some of the most fundamental problems of society, including healthcare, agriculture, bio-energy, a sustainable environment, and nutrition.
These opportunities are a result of system strategies for probing biological complexity, emerging technologies that are allowing us to explore new dimensions of data space, and the provision of novel analytical tools for analyzing, integrating, and modeling large datasets.
Over the past 40 years, I have done a lot of thinking about biological complexity and these thoughts have directly and/or indirectly led to me to participate in a series of paradigm changes that have transformed how we think about and decipher the complexities of biology and medicine.
There is an active, fascinating, and rapidly growing field that focuses on complex adaptive systems in living organisms, including their emergent properties, robustness, criticality, stochastic, and determinism.
My thinking about complexity initially was centered not on these conceptualizations, but rather appreciating that biological complexity emerges from the principles of Darwinian evolution and the need for enormous amounts of bottom-up data to decipher this complexity.
In this article I will discuss biological complexity and the five paradigm changes in which I have had the good fortune to participate. These changes have led to new strategies and technologies for attacking the complex biological and medical challenges of the 21st century. Successful implementation of these new strategies and technologies will require a balanced national portfolio of big and small science.
Early Thoughts about Complexity
Biological complexity is inherent in Darwinian evolution. The ongoing selection of infrequent genetic mutations continually reshapes the phenotypes of all organisms to respond to new environmental imperatives. These mutations are random and not directed.
Darwinian evolution also proceeds in a modular fashion, e.g., employing highly conserved basic subnetwork patterns to build complex developmental networks. Further, while evolution builds on past successful phenotypes, the criteria for selection are constantly changing with the evolving environment (and genomes for that matter).
For these reasons, the evolutionary path is often extremely convoluted. Stick insects, for example, have evolved and lost wings three different times in the course of evolution—and this capricious history is, in part, recorded in its genome. Accordingly, in biology a search for the simplest solution is rarely the most productive approach as it does not aid in the process of understanding life’s complexity.
This simple observation has enormous implications for how biology is practiced. In fact, it forces biology to focus on the development of new tools for generating and analyzing huge masses of biological data.
The only way to deconvolute this biological complexity is to perturb the systems of the organism (genetically and/or environmentally) and record detailed molecular, cellular, and phenotypic responses—and then separate the signal from significant noise to construct explanations or predictable models. It is this data that reflects the reality of the organism’s complexity and from which complexity can be deciphered.
In short, complexity requires biology to become an information science because it gives us some of the most fundamental concepts for thinking about biological complexity.
Immune System Research
I encountered the limits of a reductive approach based on Occam’s razor as a graduate student at Caltech in the early 1960s, studying the complexities of the mouse and human immune systems.1-3
I was initially interested in how B cells generate the diversity of antibodies required to defend a vertebrate organism again viruses, bacteria, and, perhaps, even cancer. I extended these studies to include T-cell receptors and molecules of the major histocompatibility locus over the first 30 or so years of my career.4-6
Increasingly, I came to appreciate the incredible complexity of the immune system. Indeed, we came to understand many of the details of the molecular basis of antibody and T-cell receptor diversity.
However, the deeper mysteries of the immune response itself, both adaptive and innate, immunological tolerance, and autoimmunity did not yield their mechanisms to simple molecular, cellular, and biochemical approaches. I gradually came to realize that one needed to take a more holistic or systems approach to studying these complexities.
Max Delbruck, also at Caltech in the 1970s, argued that these immune mechanisms could only be revealed by a more global systems approach—similar to his attempts to understand the complexities of the fungus phycomycetes. I came to realize that new tools and novel strategies were necessary to be able to deal in a more comprehensive and quantitative manner with the complexities of biology in general and immunity specifically.
So how could one go about creating the holistic system strategies and measurement (or visualization) tools for generating global or comprehensive datasets? These thoughts led to my participating in a series of paradigm changes that paved the way for dealing with biological complexity.
Paradigm Changes in Biology
Thomas Kuhn’s The Structure of Scientific Revolutions7 described how paradigm changes arise by observations that do not fit preconceived dogma and how as these observations become accepted, they catalyze new explanations or paradigm changes in science.
Kuhn also made the point that most scientists are extremely conservative and reluctant to accept reformulations of the current dogmas that lead to paradigm changes (a failure of our educational system is that we do not teach most scientists how to think outside the box). Creating paradigm changes is exciting and often appreciated only in retrospect.
But how does one go about catalyzing change?
Bill Dreyer, my Ph.D. mentor at Caltech, left me with two dictums that provide insights into this question:8
- Always work at the leading-edge of biology. It is far more interesting and exciting and provides the opportunity to discover something truly new.
- If you really want to change a discipline, invent a new tool for generating more data and/or new data types in relevant areas of data space.
Early in my career, I realized that biology is an informational science, and new technologies should aid in deciphering biological information. This informational view is central to deciphering biological complexity.
When I arrived at Caltech as an assistant professor in 1970, I divided my lab into two areas: molecular immunology (leading-edge biology) and the development of technologies to more effectively decipher biological information. Combining technology development and leading-edge biology enabled me to participate in a significant manner in five paradigm changes that have created a powerful framework and infrastructure for dealing with biological complexity:9-10
- bringing engineering to biology to create automated and high-throughput technologies for gathering biological information;
- participating in the initiation and implementation of the Human Genome Project that democratized all genes (made them readily accessible to all biologists), created the parts list that enabled systems biology, and transformed many areas of biology;
- creating the first cross-disciplinary biology department to demonstrate the power of using biology to drive the development of relevant technologies and analytic tools in the context of an infrastructure with cross-disciplinary scientists speaking one another’s languages and working together in teams;
- creating the first systems biology institute that brought transformational systems approaches to complex problems of biology and medicine; and
- pioneering the emergence of proactive systems or P4 (predictive, preventive, personalized, and participatory) medicine that will decipher the complexities of disease by gathering and analyzing enormous amounts of data for each individual patient.
The first three of these changes were a necessary infrastructural framework for the emergence of systems biology and systems medicine. Collectively, these initiatives established powerful new approaches to dealing both with the complexities of biology and disease.
Bringing engineering to biology,11 I developed five instruments in the 1970s through the early 1990s—the automated DNA and protein sequencers, the automated DNA and peptide synthesizers and the ink-jet array technology for synthesizing large numbers of oligonucleotides (in arrays).
I founded Applied Biosystems to commercialize the first four instruments developed at Caltech; Agilent commercialized the ink-jet technology. These instruments allowed the automation and integration of relevant chemistries so that information could be generated more rapidly and reproducibly.
For example, the ink-jet printer led to the idea of automation, integration of chemistries, parallization of measurements, minaturization, and high-throughput synthesis of DNA. These features are fundamental tenets for many of the emerging technologies of today (mediated, in part, through microfluidics and nanotechnology). These instruments provided a powerful infrastructure for the rapidly emerging disciplines of molecular biology, genomics, and proteomics.
My laboratory also developed a series of powerful strategies for attacking diverse biological problems, e.g., oligonucleotide ligase strategy for genetic mapping, the BAC-end sequencing strategy for genome-sequence assembly, the STS strategy for physical mapping, etc.
Human Genome Project
I was invited to the first meeting of the Human Genome Project in the spring of 1985.12-16 I was fascinated with the genome project because it was the only avenue toward producing the first complete parts list of human genes (and, by inference, proteins). It represented a necessary component of systems approaches that I was pursuing.
I was an early advocate for the genome project when most biologists and NIH were opposed (1985–1990). I also directed one of the 16 U.S. genome sequencing centers—we did portions of human chromosomes 14 and 15—and co-founded one of the first genomics companies (Darwin Molecular).
The genome project, as most realize, transformed many aspects of biology and medicine (Table 2). In developing the automated DNA sequencer, I needed to bring together four different disciplines: molecular biology, chemistry, engineering, and computer science. In doing so I came to realize the power of cross-disciplinary biology.
Initial efforts at developing an automated DNA sequencer (1978–1981) failed because the single biologist working on this problem had little knowledge of engineering or chemistry. After three years with very little success, I assembled a team composed of a chemist (Lloyd Smith), an engineer/chemist (Mike Hunkapillar), a biologist turned computer scientist (Tim Hunkapillar), and myself.
Within a few weeks we had conceptualized the four-color chemistry approach to DNA sequencing and three years later had a prototype automated DNA sequencer.
In thinking about this experience and that of my own lab, which pioneered a variety of technologies, I came to the conclusion that there should be a new type of cross-disciplinary biology department where biologists, chemists, computer scientists, engineers, mathematicians, and physicists are assembled to attack hard biological problems through developing the technologies and analytical tools necessary to solve them.
The imperative is that the needs of frontier biology should dictate which technologies are developed and these, in turn, would specify the nature of the analytical tools required (biology drives technology drives analytical tool development).
With the help of Bill Gates, I moved in 1992 from Caltech to the University of Washington to establish the first cross-disciplinary biology department—molecular biotechnology. We recruited cross-disciplinary scientists—and over the next eight years achieved a number of successes.
Ruedi Aebersold and John Yates developed some of the first fundamental techniques in the emerging field of proteomics. Ger van den Engh pioneered a multiparamenter, high-speed cell sorter. My group developed the ink-jet DNA synthesizer for DNA arrays, and Phil Green developed the key assembly and quality-assessment software for the Human Genome Project. Also, we had two of the 16 human genome sequencing centers.
This success was a remarkable testament to the power of cross-disciplinary biology. I had planned to use the department of molecular biotechnology as a cross-disciplinary foundation for building a systems biology institute. Unfortunately, the bureaucracy of a state university hindered the development of some of the fundamental new requirements for creating a systems biology institute. I resigned from the university in 2000 to co-found the independent Institute for Systems Biology.
Creating the Institute
Along with Alan Aderem and Ruedi Aebersold, I created the Institute for Systems Biology (ISB) in Seattle in 2000. This institute brought together a group of like-minded scientists intent on inventing the field of systems biology, which takes a holistic rather than an atomistic view of analyzing biological systems.17
One takes the information of the biological system of interest and from that data formulates a model that may be descriptive, graphical, or mathematical, depending on the amount of available information. Then hypotheses are formulated to test this model.
The hypotheses are tested experimentally by either genetic and/or environmental perturbations of the system. New data is gathered and reintegrated back into the model with appropriate model changes. This iterative process is repeated until theory and experimental data are brought into conjunction with one another (Figure). The models should be predictive.
This process is both hypothesis driven and hypothesis generating. Systems approaches require that data, where possible, be global or comprehensive so that all informational changes can be recorded and that measurements be made on the dynamics of the system, both temporal and spatial, and that data is integrated across the multiscale and hierarchical range of biological information (DNA, RNA, protein, interactions, networks, cells, organs, individuals, populations, and ecologies).
This integration requirement is necessary to account both for the digital information derived from the genome and the environmental signals emerging from outside the genome that operate on every level of the information hierarchy.
Systems approaches require a cross-disciplinary environment that relies on comprehensive and diverse technologies and a significant computational infrastructure. Systems approaches are one of the central tools for dealing with biological complexity.18
Data-driven, information-based, proactive P4 medicine is a systems approach to disease that focuses on understanding the dynamics of disease-perturbed networks in the disease-relevant organ(s).19-24 These insights provide powerful new approaches for understanding disease mechanisms, pioneering new diagnostics techniques, and rethinking how drug targets should be chosen.
The vision is that in 10 years each patient will be surrounded by a virtual cloud of billions of data points and that we will have the information technology to reduce this enormous data dimensionality to simple hypotheses about health and disease. The ultimate outcome is to create for the individual patient disease models that are predictive and actionable. P4 medicine is a major scientific thrust at ISB.
The challenge for all scientific and engineering disciplines in the 21st century is complexity. Biology has uniquely powerful systems approaches, emerging technologies, and new analytical tools for deciphering this complexity. Each scientific discipline has its own types of complexity although there are certainly unifying principles that interconnect them.
There are four fundamental pillars for dealing with biology’s complexity:
- Biology is an informational science. This view is essential to deconvolutioning biological complexity.
- A systems approach to dealing with complex biological systems needs to be holistic rather than atomistic.
- Evolving and emerging technologies must let new dimensions of an organism’s (and patient’s) data space be explored—in a holistic, high-throughput, integrative, and ultimately, quantitative manner.
- New analytic tools, both mathematical and computational, can capture, validate, store, mine, integrate, and, ultimately, model various types of biological data.
With these four pillars, biology is in a position to attack some of society’s most fundamental (or big) problems—healthcare, global health, agriculture, nutrition, environment, bioenergy, animal health, etc. We have had a glimpse at how these principles might be applied to healthcare and what we have learned about how to effectively attack the medical challenges can readily be extended to the other biological areas.
Systems approaches are also being used to attack fundamental problems in many other areas, including economics, sociology, and even politics. Certain types of big science will be essential for solving these societal problems.
Big and Small Science
Since the inception of the Human Genome Project, there has been a conflict in the minds’ of biologists between big and small science. In the mid 1980s, it was clear that most biologists and the NIH were opposed to the big science of the Human Genome Project. They viewed big science as wasteful and inefficient, as centered on a problem that was trivial (most of the genome was “junk”). They considered it an endeavor that could be done by technicians and that real biologists would not find interesting.
The fear also was that big science would take away financial support from small science—the single investigator-initiated science focused on very discrete hypotheses and biological problems.
Instead, the Human Genome Project, the discovery form of big science, has transformed the field of biology and medicine and brought with it new resources.
There is, however, a natural synergy between big and small science. Big science can lay out the general context of the challenges, whereas small labs are wonderful at attacking and deciphering the more subtle details.
The cry against big science continues today and is exacerbated each time research funding becomes compromised. Indeed, small science can potentially get easily lost in the maize of Darwinian complexity, and big science can provide powerful roadmaps for the most fruitful directions forward.
A critical point is that these big problems can most effectively be attacked by several forms of big science including the type we have developed at ISB.
Types of Big Science
There are many types of big science and they each can be useful for different types of problems.
First, the focused, systems-driven, cross-disciplinary, milestone-driven, and integrative systems biology of ISB has been quite effective in attacking both complex biological and medical problems such as P4 medicine.
Second, the Human Genome Project represents a wonderful example of discovery-based big science where the objective is to define all of the elements in a biological object (the human genome) without consideration of hypothesis-driven questions.
Third, another type of big science is a loose federation of scientists that cluster around a central problem such as breast cancer. These groups often do not have sharply focused objectives, are not cross disciplinary, and usually are not integrative.
A fourth approach is that of a single large laboratory that is directed by one individual and often focused on one or more big problems. These efforts can be very productive but are often not systems-driven, cross-disciplinary, nor integrative.
Finally, the DOE National Laboratories represent another type of big science that has access to incredible technologies and computational and mathematical resources. Clearly different types of big science can be used to approach different types of big problems.
The type of big science that is most effectively poised to attack many large problems of society is the focused, milestone-driven, cross disciplinary, systems-driven, and integrative biology approach. The key to this type of big science is that it is embedded in a cross-disciplinary environment that allows scientists to learn the languages of the other disciplines and permits them to operate in teams to attack the technical and analytical challenges of big problems.
Moreover, the cross-disciplinary infrastructure is readily accessible to individual scientists to do small science when necessary and appropriate. Fundamental to this type of big science is the creation of milestones to drive the process. Leadership that is organized is also important.
There are serious societal problems that could benefit from big science. How to realize P4 medicine is one such problem because the challenge of delineating dynamically changing disease-perturbed networks to provide insights into fundamental disease mechanisms, as well as new approaches to diagnosis, therapy, and, ultimately, prevention, must be addressed.
Moreover, P4 medicine requires the development of many new clinical assays for exploring new dimensions of data space. It also requires new approaches to analyzing data—that capacity to handle billions of data points for the individual patient will require novel computational and mathematical approaches.
For NIH, a balanced portfolio between big and small science is essential (and I would say the same for the other federal research funding agencies). At least 10–20% of the NIH research dollars should go into big science initially. The capacity to attack big science problems will be fundamental to our future national and international competitiveness.
In addition, for academia, there is an opportunity to make investments in the cross-disciplinary infrastructure that will make it possible for scientists to attack the big problems of society as well as the small problems of individual investigator-initiated efforts. The U.S. must seriously consider what is key for staying competitive at a national level and in the world forum of science.
1 Hood L, Gray WR, Sanders BG, Dreyer WJ. (1967) Light Chain Evolution. Cold Spring Harbor Symposia on Quantitative Biology 32:133-146.
2 Dreyer WJ, Gray WR, Hood LE. (1967) The Genetic, Molecular, and Cellular Basis of Antibody Formation: Some Facts and a Unifying Hypothesis. Cold Spring Harbor Symposia on Quantitative Biology 32:353-367.
3 Steinmetz M, Frelinger JG, Fisher D, Hunkapiller T, Pereira D, Weissman SM, Uehara H, Nathenson S, Hood L. (1981). Cell 24:125-134.
4 Steinmetz MA, Winoto K, Minard, Hood L. (1982). Cell 28:489-498.
5 Clark SP, Yoshikai Y, Taylor S, Siu G, Hood L, Mak TW. (1984). Nature 311:387-389.
6 Kuhn T. (1962) The Philosophy of Science, the Nature and Necessity of Scientific Revolutions, 148-158. MIT Press.
7 Hood, L. (2002) My Life and Adventures Integrating Biology and Technology. A Commemorative Lecture for the 2002 Kyoto Prize in Advanced Technologies. 2002 Kyoto Prizes and Inamori Grants, 111-165
8 Hood L. (2008) A personal journey of discovery: developing technology and changing biology. Annu Rev Anal Chem (Palo Alto Calif)1:1-43
9 Hood, L. Acceptance Remarks for Fritz J. and Delores H. Russ Prize, NAE Journal The Bridge, Summer 2011, 41:(2):46-49.
10 Hood L. (2002). Journal of Proteome Research 1:399-409.
11 Rowen L, Koop BF, Hood L. (1996). Science 272:1755-1762.
12 Lander ES, et al. (2001). Nature 409:860-921.
13 Lander ES, et al. (2004). Nature 431:931-945.
14 Zody MC, et al. (2006). Nature 440:671-675.
15 Heilig R, et al. (2003). Nature 421:601-607.
16 Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C, Kent SBH, Hood LE. (1986). Nature 321:674-679.
17 Hood L, Rowen L, Galas DJ, Aitchison JD. (2008). Briefings in Functional Genomics and Proteomics Jul:7(4):239-48.
18 Weston AD, Hood L. (2004). Journal of Proteome Research 3:179-196.
19 Hood L, Heath JR, Phelps ME, Lin B. (2004). Science 306:640-643.
20 Hood L, (2008) A Systems Approach to Medicine Will Transform Healthcare”, In Physical Biology: From Atoms to Medicine Ahmed H. Zewail (ed.) p. 337, Imperial College Press, London.
21 Hood LE, Galas D (2009). IBC 2009, vol. 1, no. 2, article no. 6, pp. 1-4.
22 Hwang D, Lee IY, Yoo H, Gehlenborg N, Cho J-H, Petritis B, Baxter D, Pitstick R, Young R, Spicer D, Price ND, Hohmann JG, Stephen J, DeArmond SJ, Carlson GA, Hood LE. (2009). Molecular Systems Biology 5:252.
23 Price ND, Edelman LB, Lee I, Yoo H, Hwang D, Carlson G, Galas DJ,Heath JR and Hood L. (2009) Systems Biology and the Emergence of Systems Medicine. Genomic and Personalized Medicine: From Principles to Practice (Ginsburg G and Willard H eds.) Vol.1, pp. 131-141, Elsevier.
24 Hood L, Friend SH. (2011). Nat Rev Clin Oncol. Mar;8(3):184-7.
Leroy Hood, M.D., Ph.D. ([email protected]), is president and co-founder of the Institute for Systems Biology in Seattle. He would like to thank Mauricio Flores, Gustavo Glusman, and Burak Kutlu for thoughtful comments on this paper. This paper is supported by the Center for Systems Biology P50 grant GM076547 and the Luxembourg Strategic Partnership with the Luxembourg Centre for Systems Biomedicine and the University of Luxembourg.