Equally critical but highly variable is data quality, which begins in the collection process and extends to the storage and processing capabilities of databanks. Uniformity of data procedures through standardization is the key challenge identified in the NCI Best Practices report.
The most important factor in this equation is the staff gathering the samples and their level of training and experience. For example, BioServe’s sera and tissue samples are drawn from multiple sites around the world and are collected in stages with significant variations in collection controls and standard operating procedures.
Extensive demographic and clinical data is ideally required for every sample, including the sample donor’s detailed medical history, family history, and the lifestyle choices. To illustrate the essential role of data quality and data retrieval for discovery programs, consider the case study from Phenomenome Discoveries
in its search for a novel biomarker for colorectal cancer
A targeted high-throughput screening method found that six novel molecules with formulae resembling the gamma isoform of vitamin E, tocopherol, were deficient in the serum from CRC patients and not controls. The resulting dataset of 900 accurate molecular masses was visualized using principal component analysis, which indicated a robust differentiation between controls and CRC cases.
The program rapidly achieved a milestone due to the ability to search datasets against multiple, matched biological materials in BioServe
’s global repository to validate findings. Demonstrating that the novel metabolites are absent in both tissue samples and serum of diseased patients but not in matched healthy controls allowed a convincing correlation to be made.
Uniformity in Datasets
Uniformity in datasets supporting samples allowed the research team to tease out all factors contributing to the appearance of candidate biomarkers that were not related to the targeted disease state. Again applying biological materials from BioServe, a cross-reference investigation of diseased patient samples with other disease groups revealed that some of the markers are shared in both ovarian and breast cancer samples and therefore, not specific to CRC.
The excitement that built over one promising metabolite was tempered when the research team went back to the datasets and found that patients in the identified subgroup were taking the same over-the-counter medication. This ability to reach back into detailed records across all samples was highly valued as it not only averted the embarrassment of premature elation but ultimately built the research team’s confidence in the programs’ findings, which did indeed result in the discovery of a novel biomarker set for CRC.
Based upon this data, Phenomenome was confident in going forward to develop a diagnostic test, which is undergoing clinical trials in Japan this year.
The critical interplay of the three categories proposed in this article—ethical collection, specimen integrity, and data quality—for front-end evaluation of biorepositories can be seen converging during any investigation or discovery in the omics field. These three criteria come into play again as the research program reaches its conclusion with the final validation of findings. These criteria become critical determinants that can give a research team confidence in hitting project milestones and result in findings with powerful validation.
Kevin Krenitsky, M.D., is CEO of BioServe. Web: www.bioserve.com. E-mail: email@example.com.