|Send to printer »|
Tutorials : Apr 1, 2008 ( )
Combined Correctness Can Enrich Proteomics
New Metrics Improve Potential in 2-D Gels!--h2>
Proteomics plays a central role in drug discovery, molecular diagnostics, and the practice of medicine in the postgenomic era. Among all the proteomics technologies, 2-D gel electrophoresis is by far the most commonly used technique for protein separation. In most cases where 2-D gel electrophoresis is employed, the purpose is to find statistically significant differences between predefined conditions. These differences should be related to distinct protein spots that have been correctly detected, matched, and quantified.
To achieve these goals, sophisticated image-analysis algorithms are applied to extract relevant data from complex spot patterns. The acknowledged problems of reproducibility and resolution inherent in the technology present a challenge for any software. As a result, in nearly all cases, the extracted data is partly incorrect and incomplete.
As a consequence, researchers have to deal with two serious problems in image analysis—false positives and false negatives. Both are costly, not just in terms of the resources spent on downstream analysis of false hits but perhaps more importantly, by impeding a true understanding of the underlying biological system.
In nearly all 2-D gel-based proteomic studies, the interesting data is from the protein spots whose intensities have been calculated to be significantly different between the predefined conditions. One could say that those are the desired hits. Upon visual inspection, however, some of these hits turn out to be image-analysis errors such as incorrectly detected or mismatched spots.
An example of a false positive in terms of image analysis is shown in Figure 1A and B. A zoomed-in area of a gel is shown where the spots have been detected and matched using two different image-analysis software packages. The red border in Figure 1A outlines a spot that, according to the plotted data, shows a clear change in signal intensity. The applied statistical test on this data results in a significantly low p-value.
In actuality, this significance is a result of unsuccessful detection and matching of the spots during image analysis. The matched spot border has been placed next to the spot in question resulting in flawed data and a false positive. In Figure 1B, the image analysis on the same area has been performed correctly. The resulting data shows no significant difference between the two groups. It is a true negative.
It is not unusual that 2-D gel-image analysis can result in 40–50% false positives when working with conventional 2-D gel software—even after significant manual adjustments. In some cases, the false-positive ratio can be much higher.
False negatives in this context are true spot changes that are masked or simply not detected due to errors in image analysis.
In Figure 2A, a clear difference in protein spot intensity that has been correctly identified by the image-analysis software (true positive) is shown. The same intensity change has been missed in Figure 2B because of errors in spot detection. The red border encompasses several spots thereby effectively masking the difference. The result is a false negative and an undetected biological difference.
An internal study comprising 30 different image-analysis projects carried out by different laboratories around the world using a range of different image-analysis software indicates that, on average, 75% of the true positives are missed due to errors made in spot detection and matching. Needless to say, this represents a huge untapped potential in 2-D gel-based proteomics.
The Combined Correctness Principle
To date, there have been no metrics for image-analysis quality that can reliably ensure a low ratio of both false positives and false negatives among the results. The only solution for researchers is to painstakingly go through every single spot and edit the image-analysis results where necessary. Inevitably, this will introduce unwanted bias in the analysis, not to mention the toll it takes on time and nerves.
Measuring Combined Correctness
The first step in measuring combined correctness is to define distinct categories that describe what constitutes correct or incorrect spot detection or matching. These categories need to be independent of the different approaches used for image analysis today. By applying these categories to the image-analysis data in a statistically relevant way, it is possible to measure and calculate the combined correctness for 2-D gel-image analysis.
For spot detection, all spots can be categorized into one of the four following classes as outlined in Figure 3: (A) correct, (B) false, (C) misshaped, and (D) missing. Using these four classes, it is possible to determine the overall spot-detection correctness in any 2-D gel-image analysis.
It should be mentioned that the number of ambiguously categorized spots or matches is usually low, which leads to a negligible effect on the overall correctness measurements. Using the estimated spot detection correctness and the matching correctness, the combined correctness of any 2-D gel-image analysis can be calculated using the following formula:
According to our hypothesis, the value for combined correctness should be inversely proportional to the number of false positives and false negatives in the image analysis and thus indicative of the overall data quality. And indeed, when the ratio of false positives and false negatives in an analysis is estimated and plotted against the combined correctness, we see exactly this correlation. As the combined correctness is increased, both the ratio of false positives and false negatives decreases (data will be published this year).
Consequences for 2-D Gel-Based Proteomics
The introduction of metrics such as combined correctness can have a tremendous impact on 2-D gel-based proteomics. Missing on average 75% of the interesting differences is a sign that the current approaches and procedures are far from optimal. By rigorously implementing correctness measurements and quality checkpoints at each step of the image analysis, it is possible to dramatically increase the discovery potential in 2-D gel-based experiments.
© 2013 Genetic Engineering & Biotechnology News, All Rights Reserved