Mining for Multivariate Markers
One of the biggest changes in recent years is the migration to multivariate biomarkers. It is becoming clear that the serum proteome does not offer many clear, individual biomarkers of disease, and that further advances will require looking for panels or profiles that can track changes in a number of molecules simultaneously.
Darius Dziuda, Ph.D., a professor in the department of mathematical sciences at Central Connecticut State University, is working to educate scientists about using data-mining methods to identify multivariate biomarkers. Using a multivariate approach is critically important, according to Dr. Dziuda.
“Too many studies are still limited to the univariate approach, if some of them result in efficient classifiers, it’s ok. However, the univariate approach not only neglects correlations between genes, but also removes from considerations, genes that are not significant univariately, but are very important in combination with other genes.”
Using a multivariate approach means looking for a set of genes or variables that can differentiate between classes or disease states. The focus of Dr. Dziuda’s paper at the Barcelona meeting will be his methods for identification of stable multivariate biomarkers. “First, using heuristic multivariate methods, we identify the informative set of genes that includes all significant discriminatory information. There are typically a few hundred genes in such a set. Some of them are univariately significant, others could not be identified by univariate methods.
“Then, we build a large number of bootstrap-based classifiers, which are used to vote for variables and to identify the most important expression patterns. Finally, feature selection performed on these patterns leads to small multivariate biomarkers that are stable and biologically interpretable.”
The next step is validating the resulting multivariate biomarker using external data. Validation of biomarkers is a somewhat contentious subject. There is an argument to be made that a biomarker panel does not need to be validated or mechanistically characterized in order to be useful—that the pattern alone is sufficient for clinical or research purposes.
It is becoming more and more apparent, however, that in order to make the best use of a set of genes, their function and relationship should be discovered. (The function of the individual molecule does not necessarily translate to the biological interpretation of the set.) So, while multivariate biomarkers could be useful without a biological context, this will inevitably be a temporary situation.
One of the projects Dr. Dziuda has finished uses publicly available data from acute lymphoblastic leukemia. “After filtering noise, we had about 7,000 genes. The informative set of genes included about 200 genes. Using ensembles of classifiers we identified the most frequently used genes and the most important expression patterns. Then, heuristic feature selection identified a multivariate biomarker of five genes.
“This biomarker worked well on independent test data. This and other case studies indicate that this approach works very well and results in robust multivariate biomarkers.” The method is applicable, not just for early diagnosis of disease, but for prognosis, therapeutic response, and many other situations.
“Whenever you have a case that has a number of classes that are not that easy to differentiate, it is possible that there’s a multivariate gene- or protein-expression pattern that can be used for efficient classification.”
Epigenetics Ease Cell Identification
One limitation of genetic markers is that in many cases they can only indicate a predisposition for a disease, but do not offer a quantitative differential between a diagnosable disease state and nondisease state. Cancerous tumors are a notable exception, because genetic changes can be observed in the tumor cells. However, for many conditions, a different type of marker is needed.
Epiontis is developing epigenetic markers for cell identification. The company’s initial focus was quality control tests for cellular therapeutics, working in collaboration with companies such as Genzyme. Some of its first assays identified chondrocytes in cartilage transplants for knee injuries. The marker they used was DNA methylation of specific gene regions, and the test was eventually licensed to Genzyme.
“Epiontis is now pursuing the same technology for immune monitoring,” says CBO Ulrich Hoffmueller, who will speak at the Barcelona conference. “We had several candidates and identified that demethylation of the FOXP3 gene was the best marker that could be found for regulatory T cells.”
FOXP3 has two benefits. First, it has much higher specificity than the protein that was used before. Second, the qPCR assay is technically much simpler than the fluorescence-activated cell sorting (FACS) analysis that would be required to identify the T cells otherwise. This is helpful on two levels. It is simpler and quicker to do in the laboratory, and it is also possible to compare results from different laboratories.
For large, multicenter clinical trials, it is often necessary to collect and analyze samples at separate laboratories. Using FACS, it is necessary to analyze all samples at the same time on one machine, in order to get the best results. Epiontis currently uses its regulatory T-cell qPCR assay for clinical trial immune monitoring services for clients.
Biomarkers encompass many seemingly unrelated branches of science and many disparate applications. However, each time a new biomarker, or set of biomarkers, is discovered it brings into focus a portion of the biological system from which it came. Biomarkers contribute to the understanding of biological systems, which contributes to the understanding of disease mechanism, which contributes to the discovery of new and better biomarkers.
Significant hurdles remain, especially when it comes to global biomarker discovery. It may be quite a long time before scientists can sift through the entire proteome, genome, or metabolome for the perfect biomarker. Improved technology platforms boost discovery and development efforts for biomarkers. Bioinformatics and systems biology help researchers study molecular networks, instead of single molecules, increasing the odds of developing successful biomarker assays.