If protein levels were to be viewed as so many peaks in a proteomic landscape, any fog could obscure the lesser hills and skew one’s ideas of how the landscape evolved. Such a fog has in fact obscured our view of protein abundances in mammals, assert scientists from the University of California and the Lawrence Berkeley National Laboratory.
According to these scientists, the fog consists of various types of measurement error and has led to the dramatic underestimation of low-abundance proteins. As a consequence, system-wide protein and mRNA abundance measurements have been poorly correlated, leading scientists to erroneously conclude that protein abundances are mostly a matter of protein turnover and translation, as opposed to transcription.
This finding represents a departure from recent work that has attempted to quantify gene expression and protein abundance levels—for example, a frequently cited article published in 2011 by Schwanhausser et al. in Nature. In this article, the authors wrote: “Using a quantitative model we have obtained the first genome-scale prediction of synthesis rates of mRNAs and proteins. We find that the cellular abundance of proteins is predominantly controlled at the level of translation.”
Taking issue with this finding, the scientists from the University of California and the Lawrence Berkeley National Laboratory call attention to what they call nonlinear errors in earlier work. They summarize these errors and their attempts to correct for them in an article (“System wide analyses have underestimated protein abundances and the importance of transcription in mammals”) published February 27 in PeerJ.
The article, by Li et al., rescales Schwanhausser et al.’s protein abundance estimates using data for housekeeping proteins. The rescaled estimates, report Li et al., correlate more closely with mRNA abundances. In addition, Li et al. describe using direct experimental data to estimate the impact of other sources of error on the mRNA and protein abundance measurements. This work, they say, allows these sources of error to be explicitly measured and modeled: “The resulting analysis suggests that mRNA levels explain at least 56% of the differences in protein abundance for the 4,212 genes detected by Schwanhausser et al., though because one major source of error could not be estimated the true percent contribution should be higher.”
Li et al. use a second, independent strategy to determine the contribution of mRNA levels to protein expression: they show that the variance in translation rates directly measured by ribosome profiling is dramatically lower than that inferred by Schwanhausser et al., and that the measured and inferred translation rates correlate poorly. Incorporating protein and mRNA turnover data in their analysis, Li et al. arrive at a new set of estimates: mRNA levels explain ~81% of the variance in protein levels; transcription, 71%; RNA degradation, 10%; translation, 11%; and protein degradation, 8%. These estimates differ dramatically from previous findings, which suggested that differences in mRNA levels accounted for 10–40% of the differences in protein levels.
Li et al.’s analysis may provide an accurate framework for quantifying gene expression and protein abundance levels by explicitly considering sources of error. This work, claim Li et al., highlights the importance of appropriate statistical analyses of the large quantitative data sets that are increasingly being produced by experimentalists and are being used to study fundamental cellular mechanisms.