Not so long ago (~2002), finding new cancer biomarkers in serum was made to look easy by applying an astonishingly simple new proteomics platform to a few samples from diseased patients and samples from a few healthy controls. This approach, commonly referred to as SELDI, combined three novel technology components, each of which is now known to be problematic.
The result was a general failure due to biases in the data (due in this case to machine drift between runs of cases and controls, but in other cases to sample processing and/or patient group selection). As a result, when analyses are repeated at other sites candidate disease patterns fail to replicate. Despite the efforts of dedicated biotech companies, and two of the largest clinical reference laboratories, SELDI tests for cancer have still not gained FDA approval.
While the reasons for this debacle are now well-understood and useful elements of the approach redeveloped in more rigorous form, clinical proteomics is only now recovering from the “SELDI bubble” caused by the initial excitement over this approach. Fortunately substantial parallel advances have been made (albeit with far less hype) in understanding critical sample requirements, in improving the performance of advanced MS (mass spectrometers) instrumentation, and in understanding the appropriate structure for a real biomarker pipeline.
It is difficult to overstate the importance of samples and experimental design in the operation of a biomarker pipeline. Two major factors arise: quality of the samples and number of samples.
High-quality samples are collected in such a way that there is no difference in collection or processing between groups: i.e., no bias. The typical number of samples required to convince diagnostic professionals that a biomarker is likely to have clinical utility (the Zolg number) is about 1,500, and technology platforms that cannot analyze this number of samples are not really usable in the later stages of biomarker verification.