Over the course of the last 10 years, the development of microarray technology has had significant influence on gene-expression research. However, it is experiencing the same growing pains that any new and upcoming technology faces—lack of reproducibility, repeatability, and comparability across platforms and laboratories.
As with every emerging technology, standards needed to be established. With that in mind, Leming Shi, Ph.D., principal investigator for the National Center of Toxicological Research developed the Micro Array Quality Control (MACQ) consortium, a community-wide effort encompassing 137 participants from 51 government, commercial, and academic organizations that include the FDA, NIH, EPA, all the major whole-genome gene-expression platform providers, and several alternative gene-expression platform providers.
As a vital part of that effort, the MAQC team evaluated the performance characteristics of three quantitative gene-expression technologies and correlated their expression measurements to those of five microarray platforms, according to Federico Goodsaid, senior researcher at the Center for Drug Evaluation at the FDA. The end result is a rich data set that with proper analysis reveals promising results regarding the consistency of microarray data between laboratories and across platforms.
“This project was triggered by multiple reports of lack of reproducibility in differential gene-expression data from microarrays,” says Goodsaid. “The FDA felt this was a problem. MAQC began as a study to establish biological standards and investigate the impact of analysis protocols on the reproducibility of differential gene expression. The first step was to show how to improve the reproducibility of results in the same lab from the same sample.”
One of the major findings was that scientists could reduce the variability in differential gene expression by ranking genes by fold change followed by a P-value cut-off. “Globally, if you rank genes first by fold-change with a P-value cut-off, the difference is dramatic,” Goodsaid points out. “For some of the datasets, a 20% overlap between replicates with initial P-value ranking rose to 80% overlap with an initial fold-change ranking.”
“Everyone’s involvement in this project is pretty similar,” notes Shawn Baker, scientific product manager for gene expression at Illumina (www.illumina.com). “Leming wanted to see what cross-platform compatibility existed and was unhappy with the experimental designs in the 2004–2005 time-frame and wanted to improve it. So he brought in all the microarray platform providers, and what he has shown is that all the platforms actually correlate quite well with one another.
“The point to this exercise wasn’t to excoriate or extol platform providers,” Goodsaid asserts. “This project was performed with enthusiasm and good faith by all participants to address issues about the performance and perception of differential gene-expression microarray data. Every one of the participants paid their own way for this project, motivated by an urgent need to address these issues. Results from this study will certainly impact perceptions about limitations of microarrays and highlight the need for reproducibility in microarray data.
“There are many valuable aspects of this study to the scientific community as a whole, the chief of which is a thoroughly characterized data set,” says Richard Shippy, R&D scientist and biostatistician with GE Healthcare (www.gehealthcare.com). “And what makes this data set so wonderfully rich is the fact that we now have a baseline reference for seven microarray platforms at three different test sites, as well as three alternative quantitative platforms. This data set will be extremely valuable to the microarray community for better gauging the levels of repeatability, reproducibility, accuracy, and cross platform correlations.”
“When government agencies function the way they were intended, it’s a beautiful thing,” states Paul Wolber, integrating manager microarray QC development at Agilent Technologies (www.agilent.com). “And this consortium comes at a good time for microarray technology.”
Wolber acknowledged that his team breathed a sigh of relief at the results. “At the earliest stages, it was already clear that we were all analyzing the same set of samples,” says Wolber. “Agilent provided both two-color and one-color array data and had come into the one-color branch of this study with a newly launched product. It was no trivial matter to find that the concordance between our data and everyone else’s was quite good. There were lots of shared data, really marvelous data sets that resulted. While there were some nonconcordant data points, generally concordance of data was very good. We all seemed to be measuring the same things.”
The consortium observed that the relative accuracy of the microarray platforms can be assessed by using either the titrated mixtures of the RNA samples or gene-abundance measurements collected with alternative technologies. Also, the Affymetrix (www.affymetrix.com), Agilent, and Illumina platforms displayed high correlation values of 0.90 or higher with TaqMan assays, based on comparisons of approximately 450 to 550 genes, whereas the GE Healthcare and NCI platforms had a reduced average correlation of 0.84 but included almost 30% more genes in the data comparisons. Similar correlation values for the microarray platforms were also observed relative to each of the other alternative platforms, StaRT-PCR and QuantiGene.
“The consortium has established that how the data is analyzed matters,” Wolber says. “You now have a reference method—and you can analyze the data so that it has the best correspondence and sweep away superstitions about what you think the best analysis method ought to be.
“This is a good day for science. The FDA has arranged a study that needed to be done, got a bunch of people together in one room in a way nobody else could have done, and got us all to work together,” Wolber adds. “Another point to make is that the two samples we used, the brain sample from Ambion (www.ambion.com) and the universal reference sample from Stratagene (www.stratagene.com), are generally available, the former for as long as that brain lasts, and the latter for a very long time.”
And the general availability of the samples is no small matter, GE’s Shippy pointed out. “Essentially, the MAQC project generated two reference RNAs, with Ambion and Strategene spending a great deal of time and effort preparing and storing these samples, which will last for many years to come. You will be able to purchase this RNA reference material and conduct microarray experiments for validation and proficiency testing of your individual laboratory. There is a lot of value for the scientific community in that use alone.”
Quantitative Gene Expression
Another group of researchers evaluated the performance characteristics of three quantitative gene-expression technologies and correlated their expression measurements to those of five commercial microarray platforms, based on the MAQC data set.
“We’ve been collaborating with the FDA for a while prior to this,” says Raymond Samaha, senior manager of gene expression at Applied Biosystem (www.appliedbiosystems.com). “Leming Shi wanted to revisit the issue of reproducibility and he engaged the manufacturers not only to get better correlation but also to get better mapping of oligos on each array, as well as improving data analysis. The other issue is limitation—doing gene expression on this platform is not quite as sensitive as some qPCR techniques.
“The single, most obvious impact that this is going to have on the industry is that there is going to be a push for regulating and standardizing these platforms,” Samaha says. “This has been a pretty intensive study in a field that people are coming to trust. The impact in the field of gene expression relating to drug discovery and treatment remains to be seen. But this was an important first step, bringing manufacturers of arrays together to create that standard.
“One of the key findings in this study is that you also need to use TaqMan to validate the microarray results, particularly if you are looking at low expressors,” Samaha adds.
Making the Choice
The good news for scientists is that there really weren’t such huge differences across the various platforms in the study. This can also be a bit of bad news. After all, how is a scientist to choose?
“Seeing as there isn’t a huge technological difference among the platforms, the primary driver is going to be value,” Baker adds. “Scientifically valid results will happen no matter which platform you use, but it’s going to be the platform that gives you the most value that will emerge in the forefront. Choosing a gene-expression platform should be driven by factors such as technical performance, cost, usability, input requirements, and content quality. And when reagents cost less, that allows experimental designs to be expanded, yielding more far-reaching results with the same research budget.
“I think it’s fair to say that we are all pleased with the results,” Wolber remarks. “The FDA took on this study and took it upon itself to clear the air with regard to this technology and putting standards in place. Microarray technology has improved tremendously over the last four years, and this study was a crucial step that demonstrates that this is a method that is really growing up.”
Shippy agrees. “This study also brings value to the array companies, in that now they have access to a thoroughly characterized baseline data set on all major expression profiling platforms. Array companies will be better able to further advance their products, through assay and platform modifications, which should mature the technology even further.
“The study also shows that data-analysis methods make a difference,” Wolber also cautions. “This study explored only a subset of many possible analysis methods, so the first choice you might make, based on the initial publications, might not be the best. However, the data is now freely available to the public and should drive future improvements in data analysis.”
Affymetrix (www.affymetrix.com) also participated in this joint effort. “The study was designed to demonstrate that by practicing good scientific methods in the laboratory, one can obtain tight, reproducible data,” says Janet Warrington, vp, emerging markets and molecular diagnostics R&D.
Warrington notes that the samples that were used in the study are probably now the most well-characterized RNA libraries, by microarray and RT-PCR assay, commercially available.
“No standard controls or guidelines per se resulted from this study,” Warrington says. “But hopefully, it will contribute to a better understanding of the important decisions that one makes in designing and executing an experiment. It is important to choose the appropriate algorithm or algorithms that correspond with the experimental question that one is trying to address.
“We hope that the results will build confidence that when scientists design adequately powered experiments, use good laboratory practices, and select and implement algorithms carefully that they will obtain highly reproducible, robust results,” Warrington concludes. “Strong scientific methods are required to generate good results.”
The biggest winner in the MAQC Project, Shippy notes, was the scientific community. “With this project, we have raised the profile of microarray technology, we have received the FDA stamp of approval, and now we are fully positioned to advance this field. In the short term, we’ve created the data set, the RNA material, and, of course, the papers. Personally, the most enjoyable aspect for me was working with the best and the brightest minds in the scientific field to provide novel ideas on how to handle and interpret microarray data.”
Another short-term gain is the ability to now assess performance of the platforms. “Normalization plays a critical role for successful microarray experiments,” Shippy adds. “And now we know that all of the microarray platforms are viable.”
“Scientists will try out the recommendations of the consortium in their microarray studies,” Goodsaid notes. “We will then know what the long-term impact of these recommendations will be.”