Kevin Mayer Senor Editor Genetic Engineering & Biotechnology News
Once Inaccessible Proteome Regions Can Be Reached with New Experimental and Computational Approaches
If we hear that something of scientific interest is “dark,” we have certain expectations, and they go well beyond actual darkness. That which is dark must be pervasive, and it must have profound effects. For these expectations, we can thank dark matter and dark energy. Dark matter, though invisible, is more common than ordinary matter, and it allows galaxies to keep spinning majestically without flying apart. Even more widely distributed than dark matter is dark energy, an intrinsic property of space. It is, in a sense, everywhere. And it accounts for the accelerating expansion of the universe.
Less prodigious but no less mysterious universes may be found in biology. For example, the dark matter of the human genome includes noncoding DNA, which is sometimes called junk DNA, and noncoding RNA. Roughly 95% of the human genome consists of noncoding DNA, so the dark genome is certainly pervasive. But it is unclear how much of it is functional.
Another biological universe may have an even stronger claim to having a dark side. This is the universe of proteins, the proteome. It includes proteins that largely or entirely consist of regions having indeterminate or fluctuating structure. These proteins don’t adopt crisp folds. Instead, they flop around, at least until they interact with the right binding partners. These proteins, in defiance of all the structure-function dogma still promulgated by the textbooks, fulfill a range of functions. They are particular well represented among regulatory proteins associated with transcription and translation, the cell cycle, signal transduction, and protein phosphorylation.
Moreover, proteins of the dark proteome containing floppy stretches that contain at least 40 amino acids are abundant. They account for 6–33% of bacterial proteins, 9–37% of archeal proteins, and 35–50% of eukaryotic proteins. They are especially pronounced in the signaling proteome, where they account for 60–70% of all proteins.
Finally, as of last month, the dark proteome started satisfying one last expectation raised by the word “dark.” Whatever we call dark should be something that is, at long last, being brought out of the shadows and into light.
The Human Dark Proteome Initiative
In hopes of raising dark proteome awareness, inspiring dark proteome educational programs, and expanding investment in dark proteome research, scientists at St. Jude Children’s Research Hospital, The Scripps Research Institute (TSRI), and other institutions announced the launch of the Human Dark Proteome Initiative (HDPI). The HDPI is especially interested in promoting basic and translational research into connections between the dark proteome and disease processes.
“Intrinsically disordered proteins are involved in heart disease, infectious disease, type 2 diabetes, cancer, and many neurodegenerative diseases, such as Parkinson’s disease and Alzheimer’s disease,” said Peter Wright, Ph.D., who is Cecil H. and Ida M. Green Investigator at TSRI and chair of the HDPI. “We need to advance our understanding of the functions and molecular mechanisms of these proteins so we can work toward better therapies for these debilitating diseases.”
As this comment indicates, the HDPI is focusing on that part of the dark proteome that consists of proteins that are entirely disordered—intrinsically disordered proteins (IDPs)—or proteins that incorporate both ordered regions and intrinsically disordered regions (IDRs). To date, IDPs and IDRs have been poorly characterized because they are not accessible via traditional methods of structural biology such as X-ray crystallography. Also, IDPs and IDRs have resisted analysis by means of homology modeling, a computational approach that relies on comparisons between known and unknown structures. When IDPs and IDRs are the unknown structures, reasonably similar known structures can be hard to find.
IDPs and IDRs are becoming more tractable, however, thanks to recent developments in technology, including advances in nuclear magnetic resonance (NMR) spectroscopy, single molecule fluorescence energy transfer (FRET) analysis, stopped flow spectroscopy, and in-cell particle tracking. Also, high-throughput computational modeling is making better use of protein databases.
With development of better technologies, and the introduction of outreach efforts such as the HDPI, the dark proteome field may lose its esoteric reputation. The field retains its nonconformist character even though the dark proteome emerged 20 years ago. At that time, researchers began noticing that DNA-binding proteins could delay assuming definite shapes until they actually formed protein-DNA complexes.
Much has been learned since then. “In the ensuing years, we have seen an explosion of experimental and genome-annotation studies that have mapped the extent of the intrinsic disorder phenomenon and explored the possible biological rationales for its widespread occurrence,” wrote H. Jane Dyson, Ph.D., a TSRI researcher and a member of the HDPI’s national advisory committee, in a 2011 paper in the Quarterly Review of Biophysics. “Answers to the question ‘why would a particular domain need to be unstructured?’ are as varied as the systems where such domains are found.”
Nonetheless, a year later, HDPI executive committee member Rohit Pappu, Ph.D., a professor of biomedical engineering at Washington University in St. Louis, recalled how his introduction to the field owed much to serendipity. In an interview conducted by Washington University’s news director, Dr. Pappu mentioned a conversation he had with Keith Dunker, Ph.D., a dark proteome pioneer based at Indiana University. “Every time you talk to people in the back alleys of protein science,” said Dr. Dunker, they tell you their proteins are very flexible or highly dynamic, and this dynamism is important for function.”
In Dr. Pappu’s telling, Dr. Dunker explained that he had synthesized all of the information then known about these flexible, highly disordered proteins. And, together with his colleague Vladimir Uversky, Ph.D., Dr. Dunker explored whether it would be possible to identify sequences incapable of folding autonomously. “With the help of computer scientists who taught him how to look for patterns in high-dimensional spaces,” said Dr. Pappu, “[Dr. Dunker] learned that 11 out of the 20 amino acids predispose sequences toward being disordered. Today there are about 20 predictors of disorder.”
After hearing Dr. Dunker’s story, Dr. Pappu thought to himself, “OK, either this is absolutely crackers or it is going to be transformative. I’m going to take a bet on transformative because I find what he’s saying compelling.”
Known Unknowns and Unknown Unknowns
Years ago, scientists grappling with IDPs and IDRs seemed to reenact the story of the blind men and the elephant. In this story, a group of blind men touch an elephant to learn what it is like. Because each of the blind men touches only one part of the elephant—one of its sides, a leg, the trunk, the tail—the members of the group come to different conclusions. They say that the elephant is like a wall, a tree trunk, a tree branch, or a rope.
Similarly, the first scientists to confront IDPs and IDRs introduced diverse terms to describe what they found. Over the years, these proteins, display marked conformational heterogeneity and constitute a significant part of the protein kingdom, have been described in the literature by a plethora of different names. “[These names] had been proposed before it was established that this class of proteins constitutes a separate and important extension to the protein kingdom,” wrote Dr. Dunker in 2013, in the first issue of the journal Intrinsically Disordered Proteins. “Indeed, these highly dynamic proteins with important biological functions were independently discovered multiple times, with the authors frequently inventing new terms to describe their protein of interest.”
And so Dr. Dunker and his colleagues commenced an exercise in disambiguation. But reaching a consensus on the dark proteome may prove even harder than Dr. Dunker imagined. According to a new study completed by scientists at Australia’s Commonwealth Scientific and Industrial Research Organization (CSIRO) and the Garvan Institute, in collaboration with scientists at the Technical University of Munich, there may be more than one elephant in the room. These scientists, led by CSIRO’s Dean O’Donoghue, Ph.D., say that IDPS and IDRs, along with hard-to-study transmembrane proteins, represent only the dark proteome’s “known unknowns.” The dark proteome, they insist, also includes “unknown unknowns.”
The scientists studied the properties of the dark proteome bioinformatically, filtering out pieced of information from various databases, linking them with each other and evaluating the data. They reported their results November 26 in the Proceedings of the National Academy of Sciences, in an article entitled, “Unexpected features of the dark proteome.”
“Surprisingly, most of the dark proteome could not be accounted for by conventional explanations, such as intrinsic disorder or transmembrane regions,” wrote the article’s authors. “Nearly half of the dark proteome comprised dark proteins, in which the entire sequence lacked similarity to any known structure.”
“Dark proteins fulfill a wide variety of functions, but a subset showed distinct and largely unexpected features, such as association with secretion, specific tissues, the endoplasmic reticulum, disulfide bonding, and proteolytic cleavage,” they continued. “Dark proteins also had short sequence length, low evolutionary reuse, and few known interactions with other proteins.”
These results, the authors emphasized, should help clarify the distinction between disorder and darkness in the proteome. In addition, they concluded that their work suggests new research directions in structural and computational biology.