Charles Cooney, Ph.D., Robert T. Haslam professor of chemical engineering at Massachusetts Institute of Technology (MIT), opened the recent “Label-Free Quantification and Identification for Proteomics
” symposium by defining the “proteomics gameplan,” which is determining how a cell responds to its environment and developing strategies to alter cell physiology by modifying metabolic pathways.
Dr. Cooney told the participants at the meeting sponsored by Waters
that in addition to relative protein quantification, “absolute quantification is now possible.” But the field of proteomics, which represents a convergence of multiple technologies, is still in its infancy, as existing technologies are being refined and novel methods for protein identification and quantification continue to emerge.
A series of academic researchers presented various platforms and approaches for protein characterization, differential protein expression analysis, and biomarker discovery. John R. Engen, Ph.D., associate professor at Northeastern University, led off the discussion by describing the use of hydrogen/deuterium (H/D) exchange and ultrahigh pressure liquid chromatography
(UPLC) to study protein conformation, protein folding pathways, and protein structure and dynamics.
This method is particularly useful for studying proteins that are difficult to purify or crystallize and are too big for nuclear magnetic resonance analysis. Dr. Engen’s group first labeled proteins in solution with deuterium and then enzymatically cut the proteins into pieces. After UPLC, electrospray mass spectrometry
) is used to determine where the deuterium exchanges into the protein.
Conformation experiments begin with the protein in its native state in a physiologic buffer, followed by dilution in D2O at the same pH and temperature. Aliquots are then moved to a quench buffer at various time points to stop the labeling reaction. HPLC exposes the sample to H2O, causing some of the label to be lost. This can be minimized by reducing the pH of the buffer from 7 to 2.5 and submersing the chromatography column in an ice bath to bring the temperature down from 25 to 0ºC.
The goal is to do the HPLC as quickly as possible to minimize the amount of label that is lost. These low temperature conditions prohibit the use of trypsin, but other enzymes such as pepsin can be used.
The final step involves MSE
to link ions with peptide fragments. The mass of ions will change depending on the length of time of deuterium incorporation. The different conformations a protein may be present in—active or inactive, for example—can be identified by determining which residues are protected from hydrogen/deuterium exchange. Determining where the deuterium exchanges occur is done by identifying the peptides produced during the digestion step and monitoring exchange in those peptides.
The key to improving H/D MS protein analysis is to minimize the time it takes to do the chromatographic separation. “UPLC could be the answer,” said Dr. Engen, whose group collaborated with Waters on the development of the HD-Exchange nanoAcquity system. Analyses that previously took more than five days using HPLC and tandem MS now require only 30 minutes with UPLC and MSE
, according to Dr. Engen.
PEPPeR, an acronym for platform for experimental proteomic pattern recognition, was developed by D.R. Mani, Ph.D., together with Jake Jaffe, Ph.D., in Steven Carr’s group at The Broad Institute of MIT and Harvard University. This technique is useful for discovering proteins or peptides associated with disease and can provide results that extend beyond traditional tandem MS-based protein identification by analyzing MS peaks not subject to MS/MS.
Dr. Mani, senior computation biologist at the Institute, described the “LC-MS/MS bottleneck, in which a limit of how many fragments can be identified in a run makes it difficult to get to the low-abundance peptides.”
In pattern-based biomarker discovery, the LC-MS peaks represent “features” that enable “the rescue of lost information” and make it possible to derive data from the full MS spectra. The pattern of features, based on relative (comparative) quantification, signifies a biomarker fingerprint. For maximum effectiveness, this technique requires instrumentation capable of high resolution and mass accuracy.
In PEPPeR, MS/MS is used to identify a few peptides, and these then serve as “landmarks” to guide peak alignment and matching across samples based on relative elution order, accurate mass, and retention time. The landmarks calibrate clustering tolerances for m/z and retention time, which are then used to cluster unidentified MS peaks. Machine-learning algorithms identify differentially expressed patterns in disease versus normal samples. Peaks represented in these patterns can be sequenced using accurate mass-based, targeted MS/MS protein identification.
The goal of the platform, says Dr. Mani, is to “cast your net wide in the beginning and then go back and quickly identify proteins of interest.” An advantage of this approach is speed. Identity-based biomarker discovery requires extensive sample fractionation—an estimated 280 hours of instrument time per sample pair (about 64 fractions per sample, 130 minutes of LC per fraction). PEPPeR analysis of unfractionated samples, however, requires about 15–25 hours of instrument time per sample pair (150 minutes of LC, 3–5 replicates per sample). Minimal or no fractionation contributes to higher throughput and enables analysis with no pooling of samples.
Proteomic research in plants faces several challenges including incomplete genomic and protein databases and the added complexity of unique cell components required for photosynthesis and carbon fixation. Protein characterization in plants, as in other organisms, must overcome the challenges associated with proteome coverage and dynamic range.
In a presentation entitled “Applications of MSE in Plant Proteomics and Protein Characterization,” Kevin Blackburn, mass spec lab supervisor in molecular and structural biochemistry at North Carolina State University, emphasized the need for a “shotgun sequencing approach to the proteome” similar to genomic shotgun sequencing. This would yield more comprehensive proteome coverage than conventional MS/MS methods, which are serial, discontinuous, and biased, according to Blackburn.
With MSE, peptides are fragmented and analyzed in parallel. Blackburn’s group compared the performance of MS/MS-based data-dependent analysis (DDA) with data-independent MSE using a simple, four-protein mixture spanning a 16-fold concentration range. They found the following: 32 unique peptide matches to the four proteins using DDA, with 70% of the peptides identified in only one of three replicate analyses, and 74 unique peptide matches with MSE, with 80% of the peptides present in all three replicate analyses. The improvements with MSE were most pronounced for lower-abundance sample components, where the lowest level component of the four-protein mixture had a 3,000% sequence coverage increase with MSE compared to DDA.
In another example, Blackburn reported that MSE was found to be superior to targeted DDA in identifying a protein of interest in a cell lysate, with an increase in sequence coverage of 200%. “With targeted DDA, you must make assumptions about charge states and peptide modifications, which may sometimes be incorrect,” he adds, whereas “with MSE the data acquired is independent of a hypothesis.”
Blackburn presented real-world examples of plant proteomic efforts including his group’s tomato nematode project aimed at identifying the host proteins that comprise signaling pathways involved in parasite destruction of the tomato-plant root. Blackburn’s group conducted a qualitative survey of the protein composition in four tomato tissues and found that whereas DDA matched 52 peptides to 39 proteins in the plant flower and 128 peptides to 98 proteins in leaf tissue (1.3 peptides/protein), MSE was able to match 738 flower peptides to 195 proteins (3.8 peptides/protein) and 1,597 peptides to 380 leaf proteins (4.2 peptides/protein).
These results led to the conclusion that LC-MSE is a viable alternative to LC-MS/MS for protein and proteome characterization, enabling identification as well as quantification and identification, improved sequence and proteome coverage, and lower false positive rates.
“These advantages are most dramatic for the lowest abundance components,” noted Blackburn. Furthermore, enhanced sequence coverage afforded by LC-MSE could lead to more thorough characterization of post-translational modifications.
Taking a systems biology approach to proteome analysis, Brian Mickus, an MIT graduate student, analyzed engineered E. coli K12 cells to make lycopene, a potent antioxidant and valuable nutraceutical present in some fruits, particularly tomatoes. High lycopene content is a value-added trait.
Using genomic-expression microarrays, Mickus demonstrated differential expression of some genes potentially correlated with lycopene overproduction in some mutant E. coli strains. Proteomic-expression analysis to reveal why certain strains can overproduce lycopene could lead to the development of metabolic-engineering strategies to manipulate E. coli’s phenotype and enhance lycopene production.
Mickus’ group first employed genomic differential-expression analysis between the mutant and engineered strains to determine whether one or more genes contribute to the phenotype. He discovered a potential link to the adenosine diphosphate synthetic pathway and then used in silico tools to map the observed gene expression onto the metabolic pathways of E. coli.
“Gene-expression analysis gives you a roster of players (genes) but does not tell you what’s going on in the game,” said Mickus. To gain a global perspective of the proteins differentially expressed in the mutant versus engineered strains, Mickus and Jeff Silva, senior R&D scientist at Waters, used LC-MSE to obtain proteomic-expression data and identified more than 500 unique proteins from all strains. By focusing on proteins that show the greatest degree of differential expression, Mickus hopes to identify metabolic engineering targets that might be associated with lycopene overproduction.
Switching the focus to human disease, Richard Sprenger, Ph.D., from the clinical proteomics group at the University of Amsterdam, spoke on the “Analysis and Quantification of Diagnostic Plasma Markers and Protein Signatures for Gaucher’s Disease.”
Characterized by a spleen and liver that are enlarged and inflamed, Gaucher’s is a lipid storage disorder in which glucosyl ceramide accumulates in macrophages due to a deficiency in the lysosomal enzyme glucocerebrosidase. In addition to hepatosplenomegaly, clinical symptoms include bone deformities, pancytopenia (decreased platelets), and neurologic abnormalities. Treatment is based on enzyme-replacement therapy in which patients receive infusions of recombinant glucocerebrosidase 1–2 times per week for life.
Chitotriosidase, an enzyme that breaks down chitin and is produced by macrophages in the spleen, is a marker for Gaucher’s disease. Measuring chitotriosidase levels is a good way to monitor the disease and response to treatment. One in 20 patients, however, will be chitotriosidase deficient. Enzyme-replacement therapy is expensive, and patient response is variable and, at present, difficult to monitor. Targeted biomarkers could help predict which patients are not likely to respond to treatment.
Dr. Sprenger devised experiments to search for disease markers in plasma, comparing samples before and after treatment and looking at depleted versus undepleted plasma. Compared to DDA protein identification techniques, LC-MSE offers an unbiased approach that can give greater coverage of the proteome and is better able to reveal lower-abundance proteins, according to Dr. Sprenger.
LC-MSE correctly detected the known biomarker chitotriosidase only in the before-treatment samples. The identification of chitotriosidases with different specific activities in various individuals showed that the enzyme can be polymorphic. Dr. Sprenger’s group is now validating its findings in larger numbers of patients with Gaucher’s.
Differential Protein Expression
To date, strategies for determining differential protein expression in healthy and disease human cell lines have relied on relative quantification. Label-free detection analyses using MSE make possible absolute quantification, enhancing the ability to compare individual study results.
Arthur Moseley, Ph.D., director of proteomics for the proteomics core facility at the school of medicine at Duke University, described a study of breast cancer cell lines exposed to four different treatment conditions: no treatment, cells grown in the presence of a chemotherapeutic drug, a drug-resistant strain grown in the presence of drug, and a drug-resistant strain grown in the presence of drug that is then removed. After chemotherapy, some breast cancer patients will suffer a relapse with a more aggressive, drug-resistant tumor, which suggests changes in the residual cancer cells after exposure to chemotherapy, explained Dr. Moseley.
Qualitatively, in triplicate analyses by both DDA and MSE of the native cell phenotype, a total of 474 proteins were identified in at least one of the analyses. Of these, 276 were identified by both DDA and MSE, five were uniquely identified by DDA, and 196 were uniquely identified by MSE. For the neuroblast differentiation-associated protein, both DDA and MSE identified seven peptides, and while there were no peptides uniquely identified by DDA, 86 peptides were uniquely identified by MSE.
For quantitative analysis by MSE, the proteins were required to be present in at least two of three analyses. Dr. Moseley found that 209 proteins were replicated in three of three analyses, with a quantitative coefficient of variation of 13%. Using the absolute quantitation methods of the IdentityE software, these 209 proteins comprised 87% of the total protein loaded onto the LC column. An additional 68 proteins that replicated in two of three analyses, with a 21.6% variation, accounted for only 4% of the total protein load.
Dr. Moseley concluded that, “MSE provides significantly increased proteome and peptide coverage” and enables relative and absolute quantification in the same experiment. The results were encouraging regarding the potential “to create databases based on absolute quantification,” making it possible “to mine the data across experiments,” said Dr. Moseley.
For polygenic diseases such as cancer, a single biomarker will likely not be sufficient to function as a reliable indicator. At the Proteome Centre of the University of Rostok in Germany, Michael Glocker, Ph.D., head of the Proteome Center, and colleagues are performing tissue profiling of clinical samples and animal models to identify diagnostic and therapeutic target signatures.
In his presentation entitled “Breast Cancer Tissue Profiling,” Dr. Glocker spoke of ongoing biomarker profiling studies in invasive ductal carcinoma in which he is using LC-MSE to develop protein signatures capable of differentiating between control and tumor tissue and defining invasive ductal carcinoma tissue profiles that might help to tailor treatment regimens and guide disease-management decisions.
The results are validated using a gel-based strategy for differential proteome analysis. So far 1,203 protein spots have been analyzed in a differential manner. The group is now correlating the findings with histological results and is investigating various post-translational modifications, mostly phosphorylation-dependent pathways, that could explain the differential protein expression. The ultimate goal is to find protein signatures that might be of prognostic value.