March 1, 2015 (Vol. 35, No. 5)
Better Protein Quantification Can Produce More Loose Ends, Necessitating Better Data Analysis
More complex than protein identification, protein quantification requires extra attention to detail. Routinely used, mass spectrometers will continue to get faster, more sensitive, and more accurate, and will eventually evolve to a single acquisition method for both qualitative and quantitative information.
Data analysis and data consistency between laboratories remain high-priority hurdles that need to be cleared before the intricacies of organisms’ proteomes can be explored.
Thought leaders continually gather to discuss advances. The next gathering will occur this April at the Keystone Symposia on Molecular and Cellular Biology, an event dedicated to the discussion of protein quantification challenges.
Typically, before translation into a clinical setting, biomarkers will progress through discovery, verification, and validation phases. Used predominantly in the first two phases, mass spectrometry can facilitate discovery and verification of quantitative differences between cohorts, and lead to a better biological understanding of processes within organisms or samples. As instruments improve, the conundrum becomes how to efficiently analyze the increased amounts of captured data.
Specifically designed for metabolomics, lipidomics, and proteomics work in the early phases, vendor-neutral Progenesis QI software provides large-scale, label-free quantitative analysis of LC-MS datasets. The software, which Waters picked up when it acquired Nonlinear Dynamics, may be used with any type of experimental design to accomplish the statistical analysis of multidimensional data. Support for data types such as retention time, mass, signal intensity, and collision cross-section (a physical property determined by the shape and the size of an analog measured by mobility) is provided.
“When you are dealing with proteins, lipids, or metabolites in the discovery stage and need to analyze these types of data, you could be looking at a few hundred liquid chromatography–mass spectrometry (LC-MS) data files, containing tens of thousands of biological analytes,” explains Hans Vissers, Ph.D., senior manager science operations, health sciences, Waters. “This is one of the downstream hurdles of large-scale quantitative experiments.
“About two decades ago, scientists were mainly looking at tens of proteins within a single experiment, and now we are trying to identify and quantify close to 10,000 proteins in 24 hours, facilitated for instance by modern mass spectrometers such as the Waters Synapt G2-Si. In large-scale experiments, the biological variation between samples makes it challenging to confidently identify quantitative differences.
“The more analyte properties that can be measured (such as ion mobility drift time, mass, and retention time), the more accurate the description of the analog. Informatics is becoming more important. Our partners are involved in numerous ventures to advance quantitative omics research.”
The National Phenome Center in the United Kingdom conducts metabolic phenotyping across different populations at a very large scale. The Center analyzes both patient- and population-based samples for biomarker discovery and validation, improved patient stratification through development of robust diagnostic and prognostic markers, and early identification of drug efficacy and safety and other responses to treatments. (The Center is funded by the Medical Research Council and the National Institute for Health Research, and it is led by Imperial College London and King’s College London.)
One of the Center’s goals is to develop standardized methods for consistent identification and comparison across different populations. Several phenotyping centers in various countries currently use the exact same protocols, growing biological knowledge across geographic boundaries.
Another initiative is the Dutch Biomarker Development Center (BDC). To date, thousands of biomarkers have been identified and reported. Yet few have made it into clinical translation. The goal of this public-private initiative is not to identify new biomarkers but to validate, using LC-MS technology, already identified biomarkers for chronic obstructive pulmonary disease, Alzheimer’s disease, and type 2 diabetes in a standardized way, addressing the technology gaps emerging from progression through the discovery, verification, and validation phases.
Measuring the Rate of Translation
Mass spectrometry approaches answer questions about protein amount but do not address the rate of protein synthesis. Since different messages are translated with varying efficiencies, even mRNA quantification does not determine how much of a protein is being made, and cannot be used as a quantitative proteomics measurement.
Five years ago, the Weissman laboratory discovered a ribosome-profiling approach to quantitative proteomics. The methodology is based on deep-sequencing of ribosome-protected fragments and makes it possible to determine the rate of translation—to watch directly how much protein is being made in time.
When translating a protein from mRNA, the ribosome protects about 30 nucleotides of that mRNA from nuclease digestion. These protected segments are “tricked” to stop so that they may be extracted and then sequenced, by means of next-generation sequencing (NGS), to produce a picture of exactly where that ribosome was at the time translation was halted.
An experimental technique, ribosomal profiling does not rely on the assumptions typically used in computation-based approaches to identify protein production.
“By using the ribosome to connect the protein to its message and using NGS for massive scale sequencing, we now have the opportunity to watch protein translation at an unprecedented scale,” comments Jonathan S. Weissman, Ph.D., professor, University of California, San Francisco, School of Medicine.
“We count the density of ribosomes on every message to infer how many protein molecules are being made within a timeframe. These are very accurate measurements of the rates of protein synthesis,” Dr. Weissman continues. “Once the protein leaves the ribosome, however, we can no longer watch what happens to it. Ribosomal profiling complements MS-based approaches that can ask how much protein is accumulated.”
The proteome is turning out to be more complicated than previously thought. Whole new classes of synthesized proteins or short peptides are being discovered.
Protein production is the most energetically costly cellular process. The regulation mechanisms of translation also are being revealed, such as principles that a cell uses to decide which proteins to make. For example, in bacteria, the principle of proportional synthesis applies when complex machines with multiprotein components are synthesized; each of the proteins is made in proportion to the amount in the machine.
Improving Acquisition Workflows
Overall, data processing must be made easier, faster, and more shareable going forward. Such improvements have marked the development of SWATH™ Acquisition, a protein quantitation system for AB Sciex’ TripleTOF MS instruments. The system embodies a data-independent acquisition strategy, repeatedly cycling through consecutive mass-range windows (swaths).
Since it was introduced in 2010, SWATH Acquisition has undergone workflow improvements including the use of variable windows and different chromatographic strategies. Currently, up to 5,000 proteins in a complex sample can be quantified in a 2–4 hour time frame with data completeness and quantitative accuracy measures approaching those of the gold-standard MRM (multiple reaction monitoring) approach.
“The two main challenges in proteomics today are how to generate reproducible, comprehensive, quantitative proteomics datasets, and how to handle the large amounts of data and get to a biological conclusion,” says Christie Hunter, Ph.D., director, omics applications, AB Sciex. “SWATH Acquisition is game changing in terms of how we collect high-value datasets, and the data we are seeing so far suggest that this technique will provide solid consistency across samples and labs.
“We can now routinely quantify thousands of proteins with excellent reproducibility across large datasets with very high data completeness, making SWATH Acquisition the technique for next-generation proteomics. We hope to increase adoption across a broader range of researchers, perhaps starting to replace some of the work done today by protein microarrays or other multiplexed protein-analysis techniques.”
In biomarker research and systems biology, more research teams are applying multiple omics techniques (genomics, proteomics, metabolomics, etc.) to their biological problems, providing richer information and understanding. This expertise may reside in different groups or even different institutions. Facilitating multisite collaboration will become increasingly important.
AB Sciex has partnered with Illumina to develop SWATH-based proteomic applications within BaseSpace, the Illumina Cloud environment. Proteomics and genomics data are stored in the same environment, with data being processed quickly, thanks to the capabilities of cloud-based computing. Both data and results can be easily shared with collaborators.
In five years, Dr. Hunter projects, solutions will become increasingly plug-and-play and move away from a focus on today’s analytics to directly linking with the biology.
Another option is Mascot Server, a search engine from Matrix Science used to identify and characterize proteins by matching mass spectra against a database of protein sequences. Usually the approach is to digest the sample, then analyze the resulting peptide mixture by tandem mass spectrometry. The search engine compares the experimental mass values with values calculated from the entries in a database of either amino acid or nucleic acid sequences. An appropriate scoring algorithm identifies the closest matches.
The search engine also implements methods of protein quantitation that use only the information in a standard LC-MS/MS peak list.
A complement to the search engine, Mascot Distiller is a vendor-neutral application for viewing and processing MS data files in the discovery phase. It supports native file formats and the mzML and mzXML interchange formats. Peak picking is achieved by fitting a calculated complete isotope distribution to the experimental data, not just the 12C peak. The charge state is automatically determined, and the peak list contains only monoisotopic masses, even when the signal-to-noise ratio is poor or the isotopic distribution not fully resolved.
The peak lists can be submitted to the search engine for a database search, and the results can be returned for viewing and used as the starting point for quantitation. Mascot Distiller implements protein quantitation based on MS1 (survey scan) intensity, the relative intensities of extracted ion chromatograms (XICs) for peptide precursors. The system is compatible with both label-free and label-based analyses. It can accommodate a large number of stable isotope labels such as SILAC, metabolic, ICAT, 18O, and dimethyl.
Displays of chromatographic and spectral profiles provide visual confirmation as to whether the data processing is working as intended. Three quality metrics are calculated for each ratio, and user-defined thresholds are applied to these metrics to automatically reject low-quality values.
According to John Cottrell, director, Matrix Science, most labs are still at the stage of exploring which methods best suit their data. For most users, current software is too complex and needs to become more of a black box.
Profiling Dynamic Responses of Cancer Drugs
Most molecularly targeted drugs today are co-approved with a companion diagnostic, but these tests don’t directly measure whether a therapy will work. For instance, when BRAF-mutant melanoma becomes resistant to BRAF inhibitors, the cancerous cells will continue expressing the mutant protein. Therefore, these patients would still be considered test-positive.
To improve on these indirect tests, a startup, BioMarker Strategies, is developing PathMAP® functional profiling technology, which measures how live cancer cells respond to drugs through analysis of proteins within their key signaling pathways. To help do that, they use Bio-Rad Laboratories’ Bio-Plex assays to analyze the phosphoprotein levels in cell lysates, according to Brett Houser, a product manager at Bio-Rad.
“Bio-Rad’s Bio-Plex multiplex immunoassay system enables highly multiplexed analysis of samples, which is critical when samples are both precious and scant, as in the case of some of BioMarker Strategies’ samples,” says Houser.
BioMarker Strategies’ SnapPath® instrument processes tissue from a solid tumor biopsy, distributes the sample among different wells for treatment, and then lyses the cells to stabilize the biomarkers for further analysis. This automation and standardization of the process is key to ensuring reproducible and reliable data.
The company recently published a study demonstrating that the SnapPath system can accurately assess treatment response to a BRAF inhibitor in both cell models and clinical tumor samples. The firm has begun placing their instruments at research institutions to develop the technology as a guide in drug development and treatment selection for solid tumors.