4-D separation system is expected to allow accurate mapping of proteomics data to phenotype.
Scientists say a new ‘top-down’ approach to identifying individual proteins at the proteome-wide scale can pick out isoforms of endogenous proteins that have escaped identification using more traditional techniques. The four-dimensional separation system developed by a University of Illinois at Urbana-Champaign-led team is claimed to provide a 20-fold higher separation power and proteome coverage than bottom-up approaches, and enabled the identification of proteins of up to 105 kDa in size, with up to 11 transmembrane helices.
Reporting on the technique in Nature, Neil L. Kelleher, Ph.D., and colleagues claim that top-down analysis will allow scientists to more accurately correlate proteomics data with phenotype for both basic biology and disease research. “We are dramatically changing the strategy for understanding protein molecules at the most basic level,” Dr. Kelleher states. “We weigh proteins precisely and identify them directly. The way everyone else is doing it is by digesting the proteins, cutting them up into smaller bits, and putting them back together again. I call it the Humpty Dumpty problem.”
The investigators describe their technique, and the results of whole proteome analyses, in a paper titled “Mapping intact protein isoforms in discovery mode using top-down proteomics.
Current approaches to large-scale proteome analysis involve digesting intact proteins and then carrying out mass spectrometry to infer protein identification, the researchers report. However, this “bottom-up” approach is fraught with complications that arise because of the incomplete or ambiguous characterization of proteins resulting from alternative splicing, and post-translational modifications or endogenous protein cleavage, natural process which can also combine to generate a myriad of protein isoforms and species.
A “top-down” approach to analyzing individual proteins can overcome these problems for single proteins, but isn’t suitable on whole proteome scale because there are no methods for intact protein fractionation that can be integrated with tandem mass spectrometry, or match the resolution of two-dimensional gel electrophoresis, the authors continue.
To resolve this issue, the team used a liquid-phase alternative to 2-D gels comprising solution isoelectric focusing (sIEF) followed by gel-eluted liquid fraction entrapment electrophoresis (GELFrEE)13, to fractionate proteins by isoelectric point and size, respectively. These techniques were combined with nanocapillary liquid chromatography and mass spectrometry (LC-MS) for both low and high molecular mass proteins. This approach essentially results in what the investigators describe as four-dimensional separation of whole protein molecules before ion fragmentation by tandem MS and protein identification.
The authors tested their technology by generating a quasi-two-dimensional gel perspective of the human proteome by analyzing nuclear and cytosolic extracts of HeLa S3 cells. They found that in discovery mode, and using 0.5–1 mg of input protein, the IEF–GELFrEE-nanocapillary liquid chromatography platform provided a peak capacity of well over 2,000 for separation of protein molecules in solution. “Considering the separation power of the mass spectrometer, the peak capacity of the four-dimensional system is greater than 100,000 for proteins below approximately 25 kDa,” they write. “This is 20-fold higher than the peak capacity for high-resolution two-dimensional gels (less than 5,000).”
The identification and characterization of isoforms was achieved using fragmentation data acquired with less than 10 part-per-million mass accuracy for searching databases with highly annotated primary sequences. Customized software provided the ability to overcome problems associated with protein inference, where protein isoforms generated from native processes such as alternative splicing generate a number of identical tryptic peptides.
“The databases and search engine used here are fully compatible with the UniProt flat file format and enable a deep consideration of known post-translational modifications (PTMs), alternative splice variants, polymorphisms, endogenous proteolysis, and diverse combinations of all these sources of molecular variation at the protein level,” the developers add. “Together with the careful curation of the Swiss-Prot database, the result is an informatics framework that maps each given protein identification to a single gene.”
When the system was applied to evaluate the HeLa cell proteome, the team was able to identify a total of 1,043 proteins with unique Swiss-Prot accession numbers, which originated from 1,045 human genes. “This level of proteome coverage represents the most comprehensive implementation of top-down mass spectrometry so far, with an approximate 10-fold increase in identifications of intact proteins for any microbial system, and a greater than 20-fold increase over any previous work in mammalian cells,” they claim. In fact, the technique generated fragmentation evidence for 3,093 protein isoforms/species, including 645 phosphorylations, 538 lysine acetylations, 158 methylations, 19 lipid/terpenes, and 5 hypusines.
Importantly, of the 1,043 proteins identified, 431 and 331 were identified with intact mass information from either isotope spacings or deconvolution of charge states, respectively. From this data, 54% of the isotopically resolved proteins matched the species identified from the database within 2Da, and 130 of 331 of the masses determined by deconvolution were manually determined to be of high quality: 51% of these matched within 200 Da.
The accuracy of the approach was evidenced by the finding that nine of the approximately 15 isoforms of histone H2A could be fully characterized in an automated fashion, despite the fact that they display over 95% sequence identity, the authors remark. Also identified were nine S100 proteins, several α- and β-tubulins, seven unique isoforms of human keratin (a widely known contaminant in proteomics, they point out), MLC20, BTF3 and their related sequences (which are 97% and 81% identical, respectively), and over 100 isoforms/species from the high mobility group (HMG) family. Repeating their studies on HeLa cells that were treated with etoposide to elicit a DNA damage response enabled the detection of proteins involved in cell cycle regulation, apoptosis, and DNA repair.
Application of a three-dimensional fractional approach, i.e., GELFrEE-nanocapillary LC-MS, was subsequently used to accurately monitor 17 phosphoprotein targets across different time points and different concentrations of etoposide administration. The same method was also applied to track over 2,300 species from 690 proteins in H1299 cells treated with camptothecin, and 2,300 species from 708 proteins in B16F10 melanoma cells treated with etoposide.
“The sharp increase in proteome coverage demonstrated here provides a path ahead for interrogating the natural complexity of protein primary structures that exist within human cells and tissues,” the authors conclude. “With faithful mapping of intact isoforms on a proteomic scale, detecting co-variance in modification patterns will help lay bare the post-translational logic of intracellular signaling. Also, proper speciation of protein molecules offers the promise of increased efficiency for biomarker discovery through stronger correlations between measurements and organismal phenotype.”