June 15, 2008 (Vol. 28, No. 12)
Enrique A. Dalmasso Ph.D.
Appropriate Proteomics Platform and Careful Study Design Can Improve Positive Results
Protein biomarkers have been used for many years for population screening, disease diagnosis, and prediction of therapeutic response. The increased focus on proteomics research over the last several years has led to advances in protein analytical technologies that are increasing the pace and scope of efforts to discover new biomarkers. There is an ever-increasing awareness that single biomarkers are limited in their ability to provide high predictive-value assays with clinical utility. Biological pathways that lead to disease are complex, and the ability to detect and monitor multiple biomarkers is required to achieve more robust, accurate, and predictive assays.
Proteomics technologies that enable the simultaneous analysis of hundreds of proteins hold the promise of biomarker panels that could be used to accurately detect and predict human disease states.
Human biological fluids, especially serum and plasma, contain many thousands of proteins and peptides, with concentrations varying as much as 11 orders of magnitude. The new proteomics technologies must meet the challenge of being sensitive enough to detect large numbers of proteins present at low concentrations in the presence of a small number of proteins that may comprise as much as 99% of the protein mass of the sample.
In addition, they must be highly reproducible, while providing the throughput and vigor required to rapidly analyze thousands of samples, in order to provide statistically relevant data on biomarker candidates.
Despite intensified interest and investment, however, the rate of introduction of novel protein biomarkers is falling, with only an average of one per year being approved by the FDA. This trend reflects not only the long and difficult path from candidate discovery to clinical assay but also the frequent lack of a coherent, rigorous, and comprehensive process for biomarker development.
Five Phases of the Research Process
The success rate of biomarker development programs using any proteomics platform can be increased by first dividing biomarker research into five phases that address each of the key steps of the process: study design, discovery, validation, identification, and clinical assay implementation (Figure 1).
In the initial phase of study design, the objective is to detail the clinical question being asked and the types and number of samples, experimental workflow, and technologies to be used. This phase is particularly critical to successful biomarker discovery.
The purpose of the discovery phase is to elucidate candidate biomarker proteins by screening a large number of conditions to detect the maximum number of proteins including low-abundance proteins.
Samples must be carefully chosen and in sufficient numbers to produce statistical significance. Those proteins that show significant group- or time-dependent differences are described as candidate biomarkers, which can be used alone (univariate analysis) or in combination (multivariate analysis) to produce predictive models.
The validation phase assesses the validity of a biomarker against a larger, more heterogeneous population. The robustness of the candidate markers is tested against a level of biological variability that more accurately represents the variability present in the target population. This phase may be designed to confirm the findings from the discovery phase or it may explore different variables affecting the validity of the markers for a large population. In the identification phase, the most promising markers are first enriched and purified and then subsequently identified by tryptic digestion and sequencing by tandem mass spectrometry.
The clinical assay implementation phase entails the development and optimization of assays for the validated biomarkers that are robust, sensitive, and quantitative enough to be of clinical utility. This phase can be performed at multiple points in a study, and the assays may be either chromatography or antibody based.
Understanding and managing sources of bias are also key to successful biomarker development during all five phases of the process. Small changes in protein expression levels can be detected with current proteomics technologies. Some of these changes can be due to the biological differences related to a disease or treatment under study or may reflect the heterogeneity of patients across multiple sites, the inherent complexity and diversity of different sample types, and even small differences in the sample collection, processing, and analysis techniques used. As a consequence, results may be site, study, population, or sample specific, and thus not of clinical use.
Preanalytical bias can arise from systematic differences in patient populations or sample characteristics as well as the procedures used for sample collection, handling, and storage. Differences in the manner in which samples are processed and analyzed can produce analytical bias, which can have profound effects on the outcome of a discovery study. Careful management of sources of variability and bias can help ensure reproducible results.
Preanalytical bias can be minimized by careful definition of the biological question and selection of appropriate samples, evaluation of patient and sample histories, establishment of rigorous criteria for sample inclusion and exclusion, development of standard operating procedures (SOPs) for sample collection, handling, and storage, and measurement and documentation of all potential sources of uncontrollable variation.
Analytical bias can be controlled through rigorous training, instrument qualification, and the use of SOPs, resulting in the elucidation of true biological differences. Best practices to minimize analytical bias include using sufficient numbers of replicates, processing all samples together under the same conditions including reference and quality control samples, analyzing all data using consistent parameters for processing, and maintaining detailed records of all sample-processing and data- analysis steps.
Successful Study Design
As stated earlier, the importance of proper study design cannot be underestimated, and it is useful to discuss some of the keys to successful design. Before designing a study it is often advantageous to consult a group of specialists such as clinicians, proteomics researchers, mass spectrometrists, and biostatisticians.
In particular, biostatisticians should assist in the planning of data-analysis strategies by calculating the number of samples required for statistical relevance and helping to avoid data-analysis pitfalls such as overfitting data to a model that may not be representative of the broad set of data and false discovery of biomarkers due to random chance.
Successful biomarker-discovery studies start with a clear, narrowly defined clinical question. Broad questions can introduce more variables and thus be more complex to validate. The clinical question should specify a measurable result of clinical utility and aim to yield results that improve current diagnostic, prognostic, or therapeutic methods. Two types of studies are generally used.
Retrospective studies use samples from a bank for which the clinical outcome is already known and rely on information collected by questionnaire, case records, or sample banks. Prospective studies monitor the progress, symptoms, and disease development of a selected set of patients.
The expertise of a clinician should be used to determine the appropriate sample set and clinical performance requirements for accepting and adopting the findings of any resulting clinical assay.
The next step in study design is sample selection. While human patients ultimately represent the most accurate model for clinical studies, nonhuman models display less biological variability and allow for experimentation. Sampling size should be determined by estimating the number of samples required to attain statistical relevance. Appropriate controls should be included in the sample sets. It is seldom effective to simply compare data from a group of diseased individuals only to a group of healthy ones. Controls matched for such characteristics as age as well as samples from patients with other diseases with similar clinical profiles can improve the relevance of the data.
Proper sample collection, handling, and storage are then required to produce robust biomarker discovery results. Multiple collection sites should be used to minimize systematic bias, and prolonged storage at 4°C or repeated freeze-thaw cycles should be avoided. Standardized protocols that maintain consistency in the timing of sample collection, equipment and reagents, and methods and timing of all processing steps are essential.
Careful planning of the overall experimental design is the last component of study design and will cover all aspects of the biomarker study, from selection of the appropriate proteomics platform to assay design, fractionation techniques (Figure 2), data collection (instrument settings), and analysis. An effective design ensures that the clinical question is answered, all sources of analytical bias are minimized, and the predictive value of the resulting biomarkers is tested. The workflow of each phase of the study must be defined as well as the timing of the phases, which may require several iterations and optimization.
Experimental Design for SELDI
Successful experimental biomarker study design requires selection of the appropriate proteomics platform. One platform for biomarker discovery is surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS), which combines the separation power of chromatography and high-sensitivity mass spectrometry.
SELDI technology meets the biomarker discovery challenge by providing the sensitivity required to reproducibly detect low-abundance proteins in the presence of many high-abundance ones, while also providing the throughput required to analyze the enormous number of samples required to validate candidate biomarkers and ensure their clinical utility.
The ProteinChip® SELDI system from Bio-Rad Laboratories (www.bio-rad.com) utilizes arrays that selectively bind and retain whole classes of proteins from complex samples. The mass profile of the bound proteins is then determined directly from the arrays using TOF MS, creating protein profiles of molecular mass versus peak intensity. SELDI is one of the few platforms that can analyze hundreds of proteins in thousands of samples in a timeframe commensurate with the demands of a clinical proteomics biomarker study.
As with any technique applied to biomarker studies, successful application of the ProteinChip SELDI system for discovery requires careful experimental design. The design should include optimization of procedures used for sample preparation, selection and processing of arrays, data acquisition with a focus on optimizing laser energy, and data analysis. Different array types and wash conditions generate different profiles from the same sample, and combining these conditions yields a much broader picture of the proteome (Figure 3).
Sample preparation is a potential source of analytical bias that is often overlooked and underemphasized. Consistent and appropriate liquid-handling techniques as well as defined protocols for the initial processing of samples are essential for obtaining reproducible results. Sample types such as serum and plasma are highly complex, and though addition of any sample-handling step increases chances for variability, fractionation of these sample types prior to SELDI analysis increases the number of protein peaks detected and improves detection of low-abundance proteins.
Array processing is also a key to success with SELDI, and careful thought should go into sample layout on each array, optimizing sample dilution and buffer composition and standardizing the methods for application of the matrix to the arrays. Array preparation for a single condition (one fraction, array chemistry, and matrix combination) and data collection on this condition should be completed before continuing to the next condition.
Proper data collection and analysis are essential to successful biomarker discovery with SELDI. Qualification and calibration of a SELDI-TOF MS system should be done regularly to ensure optimum performance; the manufacturer provides kits for this purpose. Data-acquisition parameters should be tested and optimized on a pool of experimental samples before collecting data from study samples.
The collected data can first be processed using the system’s default processing parameters, then reprocessed later if necessary. Statistical tests are generally used to screen for peaks that show significant differences between clinically relevant groups using either univariate or multivariate statistical techniques, and care should be taken to avoid false discovery and overfitting of multivariate models.
The application of proteomics to discover clinically meaningful biomarkers has proven to be challenging and has so far met with only limited success. However, the combination of a coherent, rigorous, and comprehensive process from study design to clinical assay implementation with the ProteinChip SELDI technology will help meet the challenge of discovering biomarker panels that could be used to accurately detect and predict human disease states, customize disease treatment, and assist in all phases of drug development.
Enrique A. Dalmasso, Ph.D., is senior staff scientist at Bio-Rad Laboratories. Web: www.bio-rad.com.
E-mail: [email protected].
A more detailed discussion of the guidelines for effective biomarker study design can be found in Biomarker Discovery Using SELDI Technology: A guide to successful study and experimental design (Bio-Rad bulletin 5642).