September 1, 2015 (Vol. 35, No. 15)

Richard A. A. Stein M.D., Ph.D.

To Capture Fleeting Expressions, Go High-Throughput

One of the unexpected findings of the Human Genome Project was that human chromosomes contain only 20,000–25,000 protein-encoding genes, fewer than had been anticipated, given the size of the genome. Subsequently, the Human Proteome Project was launched to explore the abundance and cellular distribution of the proteins and the interactions they orchestrate.

Technological advances have significantly improved our ability to structurally characterize proteins and explore their functions. Much of this work, however, still depends on time-consuming traditional approaches. To accelerate progress, high-throughput, rapid, and cost-effective strategies will have to be developed. Such strategies will enable investigators to characterize the protein repertoire from different cells types, and to capture protein distribution and dynamics under various conditions, challenges that are among the most challenging—yet rewarding—in the life sciences.


The open (upper right) and closed (middle) conformations of the DNA repair helicase UvrD were assessed by researchers centered at the University of Illinois at Urbana-Champaign. Conformation changes and unwinding activity were measured by combining optical traps and single-molecule confocal microscopy. [Image courtesy of Matthew Comstock, Ph.D., Michigan State University]

Protein-Quality Solutions

“The main bottleneck for the high-throughput structure determination of low- and medium-size soluble proteins that we work on is to obtain structure-quality protein preparations,” says Kurt Wüthrich, Ph.D., Cecil H. and Ida M. Green Professor of Structural Biology at the Scripps Research Institute, professor of biophysics at ETH Zurich, and co-recipient of the 2002 Nobel Prize in Chemistry. “This limitation applies equally to NMR spectroscopy and to crystallography.

“As a tool for preventing unnecessary work on the structure determination side, we introduced the ‘NMR Profile’ for identifying, from among the many proteins that we screen, those that are available as a structure-quality solution and thus amenable for NMR structure determination.”

To solve three-dimensional protein structures in a high-throughput manner, the Protein Structure Initiative (PSI) was established in 2000 by the National Institute of General Medical Sciences. This initiative unfolded in three five-year phases, and came to an end in June 2015.

The structures of about 90% of the proteins that were part of this effort were solved by X-ray crystallography, and the remaining 10% of the proteins were solved by NMR spectroscopy.

With the support of the Joint Center of Structural Genomics (JCSG), a multi-institutional consortium, the PSI was able to obtain backbone NMR assignments for globular proteins containing up to about 200 amino acids. The JCSG established a standard approach based on a set of three automated projection spectroscopy (APSY) experiments. APSY relies on experimental and computational approaches that allow the resonance assignment of polypeptide backbones in proteins to be obtained in a fully automated fashion.

APSY presents two major advantages. First, APSY generates results faster than standard triple-resonance NMR experiments. Second, the experimental results are easy to validate with the additional NMR data that are needed for the structure determination.

On a set of 30 JCSG target proteins, Dr. Wüthrich and colleagues characterized the structure-quality protein solutions that were used for the APSY-NMR measurements and subsequently used the UNIO-MATCH-2014 software to perform automated data analysis. This strategy provided high-quality structural data for all the proteins included in the analysis, and for 22 of the 30 proteins, the experimental time required for the APSY data acquisition was less than 30 hours.

For the size range of proteins included in the study, between 62 and 152 amino acids, the results did not depend to a significant extent on protein size, suggesting the possibility that this approach could also be amenable for proteins containing up to 200 amino acids, as has been confirmed in additional publications. “My hope for the coming years,” avows Dr. Wüthrich, “is that our robust methodology will be used by the scientific community for addressing important biological problems with the use of structural biology.”


For a set of target proteins generated by the JCSG group, structure-quality solutions were prepared and subjected to NMR profiling and computational analysis, producing protein structures such as the ones shown here.

Antibody Screens

“We identified rare antibodies that protect virus-infected cells from apoptosis,” says Richard A. Lerner, M.D., professor of immunochemistry, Scripps Research Institute. “The key point of our approach was that we performed selection for function.”

A major initiative in Dr. Lerner’s laboratory is focusing on developing screening strategies that allow very rare individual receptor agonists to be identified from pools of hundreds of millions of unique antibodies, based on specific phenotypic characteristics that are of interest. This experimental strategy facilitated the identification of integrin-binding agonist antibodies that converted human stem cells into dendritic cells.

Another effort in Dr. Lerner’s lab involved probing the entire repertoire of antibodies that are generated by an individual. According to the lab’s investigators, this antibody repertoire constitutes the “fossil record” of the immune response. Screening this repertoire in cancer patients to probe for the presence of antibodies that could identify metastatic cells, Dr. Lerner and colleagues identified antibodies specific against the active conformation of the integrin alpha v beta 3 receptor and, in a mouse model, revealed that some of these antibodies interfered with lung colonization by human breast cancer cells.

“The success of this approach depends on having a robust selection system and a selectable phenotype,” asserts Dr. Lerner.

In one of their more recent efforts to evaluate this approach, Dr. Lerner and colleagues considered whether inhibition of cell death could be used as a phenotypic screen to identify, from unbiased combinatorial antibody libraries, rare molecules that allow cell survival after rhinovirus infection. This strategy led to the identification of rare molecules that, when expressed in the cytoplasm, conferred protection from virus-induced cell death.

HeLa cells were sequentially infected with an antibody library, followed by infection with a rhinovirus. Cells expressing functional antibodies were protected from apoptosis, and remained adherent to the bottom of the culture flask, forming a monolayer, while cells that underwent apoptosis became detached.

The experimental design involved five rounds of selection. In each round, antibodies purified from surviving cells from the previous round were used to create a new antibody library to re-infect cells in the successive round, and this enriched the antibodies for the ones that conferred protection.

From a pool of a hundred million antibodies, this selection process identified two that were protective. Mass spectrometry together with biochemical and molecular biology approaches were used to identify the target of these rare antibodies that prevented cell death, and revealed that they inhibited the rhinovirus-encoded 3C protease that cleaves a virally encoded polyprotein and is critical for viral maturation.

“This strategy is very powerful,” states Dr. Lerner. “During these experiments we also learned a lot about signal transduction during infection.”

Proteomic Dynamics of XEN Cells

After fertilization, mammalian zygotes develop into blastocysts, which consist of an outer trophoblast epithelium that forms extra-embryonic tissues, and an inner cell mass. During the early stages of pre-implantation development, cells from the inner mass differentiate into the epiblast, also known as the primitive ectoderm, and the hypoblast, also known as the primitive endoderm. The epiblast persists only transiently, for several days, and forms all the somatic and germline cells of the embryo, and the hypoblast gives rise to some of the extra-embryonic tissues. The epiblast is also the source of embryonic stem cells.

Specification of the epiblast and hipoblast fate is accompanied by changes in the expression of specific transcription factors. For example, in the mouse blastocyst, cells that develop toward the hypoblast fate downregulate the OCT4, SOX2, and NANOG transcription factors, which define the epiblast, and overexpress GATA4 and other transcriptional factors, which are specific for the hypoblast.

“We attempted to perform the quantitation of as many proteins as possible during the early stages of extra-embryonic endoderm (XEN) differentiation at the blastocyst stage,” says Kathryn S. Lilley, Ph.D., professor of biochemistry, University of Cambridge. “We explored changes in protein abundance, and used informatics tools that we developed to also look at how proteins behave in a coordinated fashion over the narrow time window of our experiment.”

Taking advantage of a tandem mass tag (TMT)-differential labeling mass spectrometry approach, Dr. Lilley and colleagues profiled the proteome during the narrow time window when the extra-embryonic endoderm differentiates in vitro from mouse embryonic stem cells. This approach took advantage of the fact that by inducing GATA transcription factors, embryonic stem cells can be differentiated toward hypoblast-like cells, allowing the very transient transition from the pluripotent to the differentiated, extra-embryonic state to be captured.

During this work, Dr. Lilley’s team quantified over 2,000 proteins and characterized several clusters based on their temporal cellular abundance profiles. These include proteins involved in cellular metabolic processes, chromatin remodeling factors, and proteins that orchestrate the reorganization of the extracellular matrix.

“A challenge in the field is that rather than measuring protein level information, we convert a lot of proteins to peptides and then use peptide measurement as a surrogate for the proteins from which they have been derived,” notes Dr. Lilley. When the proteome is characterized with mass spectrometry-based approaches, the peptides that are used to collect information about the proteins are often not informative about specific protein isoforms that they originate from. “Given that there are likely to be hundreds of thousands of protein isoforms in the cell,” advises Dr. Lilley, “we have only scratched the surface proteomics studies.”

Single-Molecule Nanometry

“Structural tools usually only give a snapshot and do not usually reveal details about  function, while functional tools generally do not provide insights about the structure,” says Taekjip Ha, Ph.D., professor of physics, University of Illinois at Urbana-Champaign. “One of our goals is to measure protein structural changes while monitoring the function in real time.”

To simultaneously survey protein structure and function, Dr. Ha’s lab is using single-molecule fluorescence measurements that capture conformational changes in real time, in parallel with functional tools, such as optical tweezers. Dr. Ha and colleagues illustrated the power of this approach in a study of the DNA repair helicase UvrD. Previous work has shown that UvrD has two conformational states, but the experimental approach was not powered to elucidate the biological significance of each of the two distinct conformations.

“We used FRET to measure the structural changes in helicase, and in parallel we used optical tweezers to simultaneously measure the function of the same molecule,” explains Dr. Ha. This work was performed in collaboration with Yann R. Chemla, Ph.D., associate professor of physics and biophysics, University of Illinois at Urbana-Champaign.

The investigators found that UvrD has two different types of unwinding activity. One type of motion, called “frustrated,” involves the repetitive and bidirectional unwinding and rezipping of less than 20 base pairs of DNA. The other type of motion, called “long distance,” is less repetitive. It involves an enzyme that unwinds over 20 nucleotides.

The investigators also revealed that the two previously described conformational states, “closed” and “open,” are correlated with the movements of helicase toward or away from the replication fork.
One of the challenges in exploring the structure-function relationship of proteins is that in vivo, proteins establish dynamic and often transient interactions that modulate their structure and control their function. Capturing these interactions and characterizing the conformational changes that occur during this process require an interdisciplinary approach.

“This requires the convergence of many different fields, including single-molecule methods, computational tools, and molecular modeling,” advises Dr. Ha. “To manipulate the system, biological or chemical tagging methods have to be used, and they cannot be allowed to interfere with each other.”

Protein-Protein Interactions

“We are looking at how interactomes differ or change in various tissues, and how they respond to external factors, such as drug treatment,” says Leonard J. Foster, Ph.D., professor of biochemistry and molecular biology, University of British Columbia. Previously, the approaches most commonly used to capture interactomes involved protein tagging, yeast two-hybrid, and protein fragment complementation assays, but despite being powerful, these experimental strategies present several shortcomings, including a limited scalability.

To capture protein interactomes in high-throughput fashion, Dr. Foster and colleagues developed a bioinformatics tool that improves the analysis of protein-protein interaction networks that have been generated with the protein correlation profiling SILAC (PCP-SILAC) approach. PCP-SILAC relies on generating chromatographic elution profiles, under native conditions, for all the proteins that are separated from a complex mixture. Subsequently, using specific characteristics of the chromatograms, protein-protein interactions are identified based on similarities in the features of the individual protein chromatography profiles.

The set of tools that Dr. Foster and colleagues developed allows a more rapid analysis and characterization of the experimental protein-protein interaction datasets. Another advantage of this new bioinformatics software workflow is an improvement in the quality of the data collected across experimental replicates.

“We need to ensure that we are controlling as stringently as possible the data quality,” insists Dr. Foster. “Ensuring adequate control has been challenging in the protein interaction studies conducted so far.”

As a case study to validate this new software, Dr. Foster compared existing datasets generated from HeLa cells infected with Salmonella typhimurium and uninfected cells, and identified specific sets of protein-protein interactions that are modulated by the infection. “Protein interactions are at the core for this method,” notes Dr. Foster. “They provide a way to test the response of the network to various biological conditions.”

While this new software allows more data to be extracted, one of the most significant challenges remains the ability to improve the bioinformatics side of data analysis. “Over the last year, we were able to extract more data from existing datasets,” confides Dr. Foster. “But there is a lot of information that we still have not been able to pull out. Improving the informatics side to extract more data will be essential.”


At the University of British Columbia, Leonard J. Foster leads the Center for High-Throughput Biology, which has developed bioinformatics tools for analyzing and characterizing protein-protein interactions.

Previous articleSteadier Gene Networks Mean Longer Lifespans
Next articleSynthetic Tumor Environments Add Realism to Cancer Research