By Daniel Hornburg, PhD, and José Castro-Perez
Improvements in mass spectrometry (MS) technology are allowing surveys of the proteome to achieve greater depth and breadth—as well as greater speed, sensitivity, and specificity. Old obstacles, such as physicochemical complexities and dynamic range challenges, are being overcome by sophisticated MS instruments and novel workflows. It is no wonder, then, that in a widening array of life sciences disciplines, MS is driving scientific progress.
What proteomics can teach us
Proteomics is expanding and enriching biological information space. To date, this space has been dominated by genomics and transcriptomics, disciplines that have taken advantage of technological advances over the past 15 years to support large-scale studies, enabling progress in diagnostics and therapeutics development. Nonetheless, genomics and transcriptomics provide an incomplete picture of human health and disease.
Genomics and transcriptomics are a little removed from the functional activities of cells and tissues. In contrast, proteomics is devoted to the entities that embody these activities. Proteomics is, in fact, the study of protein abundances, modifications, and localizations.
Despite the relevance of proteins for biomedical research, proteomics has not progressed as rapidly as genomics and transcriptomics because the proteome is functionally and chemically complex (Figure 1). The proteome presents multiple layers of information (including information about post-translational modifications [PTMs], protein locations, and protein-protein interactions), and it is exceedingly dynamic, with abundances of individual proteins ranging across more than 10 orders of magnitude.
In addition, with twenty thousand genes encoded in the human genome, estimates are that there may be over one million distinct protein variants, called proteoforms,1 in each cell type. These proteoforms emerge from a plethora of combinatorial PTMs that modulate their structure and function, the understanding of which can provide a wide range of biological insights.2–4
Consider that caspase-1 is activated by an inflammasome and processes pro-interleukin-1b into an active, pro-inflammatory interleukin-1b.5 This complexity illustrates how the information landscape of proteomics requires a powerful, high-throughput methodology to capture meaningful, information-rich data to deepen our understanding of the underlying biology.
MS is a powerful analytical tool that creates gas-phase ions from the molecules present in a liquid sample. The ions—the charged molecular fragments—are directed to an analyzer that generates a mass spectrum, which is used to determine the ion masses, the molecular masses, and ultimately the structures of components of the sample.
Significant advancements in MS are of intense interest in many areas of life sciences research. Improving the ability of MS to identify, characterize, and quantify peptides and proteins promises to illuminate many unanswered questions in biology.
Current approaches and unmet needs
There are multiple approaches to proteomics today, including targeted non-MS-based strategies and MS-based strategies. The latter can be targeted as well as untargeted (unbiased).
Targeted non-MS-based approaches typically use analyte-specific reagents such as antibodies or aptamers to screen for specific predetermined protein epitopes, or peptides. For these assays, the maximum number of proteins that can be interrogated is fixed, and the information is largely limited to the differences in the quantity of that set of proteins. They are scalable in terms of number of samples but limited to what is known by targeting predetermined protein molecules.
Moreover, for non-MS-based workflows, accuracy of the identification depends on the specificity of the affinity reagents. This constraint makes these techniques impractical to scale to tens of thousands of structurally related proteoforms, and limits the capability of these techniques for deep characterization and quantification of changes in the proteome.
Unbiased MS-based proteomics enables the global interpretation of biological systems by comprehensively analyzing proteins and their functionally relevant changes, including PTMs. Currently, these approaches—also called hypothesis-free approaches—present either a shallow look at proteomes with high dynamic range in samples such as plasma or can go deep only with complex upfront workflows. The conventional deep methods pair time-consuming protein depletion and peptide fractionation with liquid chromatography–tandem mass spectrometry (LC-MS/MS).
Although these methods are considered unbiased, they do not effectively scale to larger numbers of samples and are not standardized across labs. The proteomic analysis of human blood and blood-derived products, for example, offers a vast wealth of information to translate research from the lab to the clinic. To comprehensively survey and analyze the full diversity of the plasma proteome across thousands of samples and labs, scalable solutions are required that are not limited by what is known. For this reason, researchers look toward new analytical methods and enhanced MS workflows to offset the growing demands of large-scale clinically translatable biological research.
Advances in MS
In recent years, improvements in MS technology have drastically evolved the landscape of MS-based proteomics. Where instruments once presented speed, sensitivity, and specificity limitations, the scientific community is racing to provide new instruments with state-of-the-art technology to overcome performance challenges and offer new methods for proteomic researchers.
Applications such as large-scale, high-throughput proteomic analyses in clinical research can benefit from improvements in data-independent acquisition (DIA) workflows such as the Zeno SWATH DIA approach, which provides faster scan rates while increasing MS/MS sensitivity for the identification and quantification of key targets. In addition to sensitive DIA for high-throughput proteomics, an important area of innovation is the rise of unique fragmentation technologies such as electron-activated dissociation (EAD), which allows for rich characterization and quantification of PTM states such as protein phosphorylation state.
Sample delivery via microflow LC—coupled with these MS innovations—provides excellent sensitivity but with the advantage of greater robustness compared with the more common nanoflow LC workflows. This leads to more uptime, less troubleshooting, and higher throughput. Furthermore, the continued growth in MS-based proteomics has compelled the scientific community to upgrade or develop algorithms, tools, and repository databases in the field of proteomics.
The technological bottleneck of reproducibly accessing proteomics information across the large dynamic range of blood plasma samples at scale has recently been addressed by combining nanoparticle (NP) engineering with proteomics workflows. Synergistic panels of NPs can be engineered to reproducibly interact with thousands of proteoforms via their inherent ability to form complex protein coronas.6
NPs can be engineered to provide a combination of generic interaction domains that attract a broad range of proteins, and this does not require prior knowledge of specific epitopes. For instance, charge properties can be combined with hydrogen donors to provide high affinity to a large portion of proteins and render them more visible to any downstream detector. As such, a panel of NPs can be designed to perform deep sampling of the entire proteome. Importantly, the formation of protein coronas on the NPs is a function of the concentrations of the proteins in the biosample and their affinities for the NPs, enabling a quantitative survey of proteomics content.7,8
NPs are an attractive tool for proteomics because they have been used in nanomedicine and drug delivery for decades, providing a wealth of experience and knowledge for various designs and reproducible engineering.9 Research has shown that an automated NP-protein corona-based proteomics workflow enables more precise quantification and deeper sampling of the proteome compared with a conventional deep workflow. Moreover, the NP design and engineering process can be guided by machine learning to both diversify the proteomics content captured by a panel of NPs and to survey specific families of proteins.10
Genomics is an easily accessible indicator of disease risk but does not directly reflect the phenotype. Only by identifying and characterizing proteins associated with disease states in large, well-powered cohort studies will researchers gain essential mechanistic insights and discover new biomarker signatures that reflect both environmental factors as well as genetic predispositions (Figure 2). This will enable early and more precise patient diagnoses; help clinical researchers stratify patient groups; and speed development of new tailored therapeutics.
The complexity of biology in general—and, specifically, protein biochemistry—requires multidisciplinary solutions: sample preparation, MS detector technology, and data science are all needed to work hand in hand. By first compressing the extraordinarily large dynamic range of the proteome captured within blood plasma samples into a more manageable abundance distribution, and then employing unbiased cutting-edge MS for accurate and deep detection, proteomics information can now be accessed broadly at scale.
1. Abersold R, Agar JN, Amster IJ, et al. How many human proteoforms are there? Nat. Chem. Biol. 2018; 14(3): 206–214.
2. Crutchfield CA, Thomas SN, Sokoll LJ, et al. Advances in mass spectrometry-based clinical biomarker discovery. Clin. Proteomics 2016; 13: 1.
3. Geyer PE, Holdt LM, Teupser D, et al. Revisiting biomarker discovery by plasma proteomics. Mol. Syst. Biol. 2017; 13: 942.
4. Geyer PE, Kulak NA, Pichler G, et al. Plasma proteome profiling to assess human health and disease. Cell Syst. 2016; 2: 185–195.
5. Elliott JM, Rouge L, Wiesmann C, et al. Crystal structure of procaspase-1 zymogen domain reveals insight into inflammatory caspase autoactivation. J. Biol. Chem. 2009; 284(10): 6546–6553.
6. Donovan MKR, Huang Y, Blume JE, et al. Peptide-centric analyses of human plasma enable increased resolution of biological insights into non-small cell lung cancer relative to protein-centric analysis. bioRxiv 2022; DOI: 10.1101/2022.01.07.475393.
7. Blume J, Manning WC, Troiano G, et al. Rapid, deep and precise profiling of the plasma proteome with multi-nanoparticle protein corona. Nat. Commun. 2020; 11(1): 3662.
8. Hornburg D, Ferdosi S, Hasan M, et al. Enhanced competitive protein exchange at the nano-bio interface enables ultra-deep coverage of the human plasma proteome. bioRxiv 2022; DOI: 10.1101/2022.01.08.475439.
9. Liu Y, Wang J, Xiong Q, et al. Nano–bio interactions in cancer: from therapeutics delivery to early detection Acc. Chem. Res. 2021; 54(2): 291–301
10. Ferdosi S, Tangeysh B, Brown TR, et al. Engineered nanoparticles enable deep proteomics studies at scale by leveraging tunable nano-bio interactions. Proc. Natl. Acad. Sci. USA 2022; 119(11): e2106053119.