February 1, 2006 (Vol. 26, No. 3)
Methods Are Progressing to Improve Microarray Quality Control
The development of microarrays launched gene expression profiling on to a new level. However, the biggest challenge in microarray research today is data quality. Concerns voiced by those in the industry have prompted the development of new technologies and methods to address this.
Although microarrays can measure RNA expression levels for thousands of genes simultaneously, the majority of users agree that quality data remains a challenge. Expression Analysis (www.expressionanalysis.com) says it developed the first proficiency testing program for microarray assays. We discovered early on that microarray labs want to monitor the consistency of data being generated, and that the FDA also has to judge data quality, says Laura Reid, Ph.D., director of R&D.
The company began a collaboration with the FDA and Schering Plough in 2003 to provide a mock submission, including the first electronic files of microarray data. We started to realize there were no defined thresholds of whats good and whats bad data. There were few studies to compare data between laboratories or data from the same laboratory over time that would validate the expression results.
The initial phase involved practicing data comparability studies to look at variability between two labs. The second phase involved using rat RNA samples with known expression differences to repeatedly generate similar data sets at multiple time points in 18 labs. The third phase of the program is now under way, utilizing human RNA samples developed as part of the Microarray Quality Control (MAQC) project initiated by the FDA. Its purpose is to provide quality control tools and develop guidelines for microarray data analysis by providing the public with large reference datasets along with available reference RNA samples.
Proficiency testing is not currently required for microarrays, explains Dr. Reid. Weve shown you can get good agreement across labs and even across platforms. So, its encouraging, and Im hopeful people will have more confidence in microarray results.
Platform Comparisons
Most in the microaray field agree that there is a need for methods to compare data across different platforms. Applied Biosystems (www.appliedbio.com) presented a study where they performed a large-scale validation of two microarray platforms using TaqMan assays. Everyone has been doing platform comparisons, says Raymond Samaha, Ph.D., senior manager, gene expression. The common standard is to compare microarrays to QRT-PCR, but these are usually done on a small scale. We wanted to know if you took a random gene set and a large gene set, say 1,400 genes, and made sure the genes were distributed over the expression profile, what would the results be when we validated the microarrays?
The group compared the companys human gene survey microarray and expression array system to a two-color platform using TaqMan as the gold standard of gene expression measurement. The two platforms performances were evaluated based on detection sensitivity and accuracy, fold-change correlation with TaqMan, differential expression sensitivity and accuracy, and expression profile correlations. Human brain, liver, lung, and universal human reference samples were used.
For each gene we looked at the expression profile across the different tissues and then compared the profile gene-by-gene across the platforms, explains Dr. Samaha. Results indicated that the simple coloration method to compare platforms is not the best way. The signal is really dependent on whatever algorithm each supplier uses to extract it from the platform. This can introduce a lot of variability and skew the comparison. But when you compare things at the fold-change level, that normalizes out the idiosyncrasies that come from different algorithms, Dr. Samaha states.
Additionaly, the group concluded that there is better correlation with single color arrays with TaqMan for fold-change and expression profile correlation, more genes with significant fold-change are detected with single color platforms, and that TaqMan is more sensitive than hybridization-based array platforms.
There are a lot of discrepancies between the microarrays and TaqMan due to differences in sensitivity. When you start looking at low expressing genes, the correlation between the two gets worse. Weve always suspected that, and now we have the numbers to back it up, summarizes Dr. Samaha.
Enhancing Sample Preparation
Several companies are addressing the loss of nucleic acids during sample preparation. Stratagene (www.stratagene.com) realized the need for a method to reduce, or if possible, eliminate sample loss for RT-PCR. There are a lot of discussions in the field about how accurately you can detect RNA in the first place. Theres a lot of mystery around that and the proper ways to detect RNA, so we decided to combine them in one solution, says Anne St. Louis, director, product marketing.
The SideStep Lysis and Stabilization Buffer minimizes loss and degradation of nucleic acids and stabilizes them without having to process them further. Using a single-tube format, it lyses cells at room temperature in 10 minutes, releasing both RNA and DNA. One of the most important features of SideStep is that it gives you a safe stopping point to collect samples, explains L. Scott Basehore, Ph.D., senior research associate. Once they are in the lysate, they can be safely stored at -80 degrees for long periods of time of up to six months.
In addition, since both RNA and DNA are present in the samples, it allows for QRT-PCR or RT-PCR. This method ensures accurate gene quantification in downstream QRT-PCR by preventing sample loss and degradation. It also is ideal for archiving samples, especially rare cells from tissue biopsy. Also, the buffer allows for more assay flexibility in that you can go back to a sample thats been archived and retrieve important information, adds Dr. Baseshore.
Pierce Milwaukee (www.piercebc.com) also developed a new reagent. This is a cytoplasmic lysis reagent that breaks open the cells and releases the cytoplasmic contents, keeping the nucleus intact, says T.S. Rama Subramanian, Ph.D., R&D manager.
The resulting lysate can be used directly in downstream applications, such as RT-PCR and QRT-PCR. The advantage is that the lysate contains no DNA, and if the research is focused on gene expression, what you are monitoring is actually full-length, fully processed mRNA, and the gene expression data so obtained is a better representation of the functional mRNA, adds Dr. Subramanian.
Since the lysate also contains proteins, it provides the opportunity for more meaningful correlation between gene and protein expression. If you can just lyse the cells and assay directly, you are perhaps closer to the truth by looking at the RNA level, without introducing any purification-induced bias. The lysate works well in both RT-PCR and QRT-PCR, Dr. Subramanian explains. We wanted to develop a method geared toward high-throughput screening and something that would quickly indicate any change in gene expression, he summarizes.
The reagent is still in development with an anticipated launch date at the end of the first quarter.
Optimizing Microarrays
Agilent Technologies (www.agilent.com) combined two technologies for further optimization of microarrays. These include a reagent kit, QuantiGene, that accurately quantifies mRNA levels without RNA purification and external spikes, which are added to the sample before amplification and measure gene expression.
There are still some issues around the numbers of quantitating mRNA levels in that they are not absolute measurementsthey are still referenced to some other transcript in the sample. We realized there were things we could do to make the system operate better, says Paul Wolber, Ph.D., director, microarray QC development.
The QuantiGene assay quantifies mRNA directly from cell lysates. Target mRNA for lysed cells is captured by hybridization and transferred to the capture plate. Signal amplification is performed by hybridization of branch DNA amplifier and label probe. An added chemiluminescence substrate yields the QuantiGene signal that is proportional to the amount of mRNA in the sample. This measures RNA levels down to 10,000 copies.
There has been some argument in the field about the degree of bias that the reverse transcriptase step does or does not introduce into things. The nice thing about this assay is that its direct and works right on the mRNA and produces a highly amplified signal for its presence, states Dr. Wolber.
The spike ends are based on 3 sequence tagging of the E1A gene (adenovirus). Sequence tags are placed at the 3 ends (random sequences), characterized, and then a set of about 10 are chosen.
You can put together mixtures of these, where you use well-accepted methods of quantitating mRNA, and spike them into your sample before amplification. There are probes on the microarray specific to the spikes that allow you to measure what you get back, explains Dr. Wolber. The company is offering the spikes for both one- and two-color systems.
More Focused Arrays
As an addition to its CodeLink microarray platform, GE Healthcare (www.gehealthcare.com) developed a multi-assay chamber that enables up to 16 samples to be washed, hybridized, and detected in parallel on a single slide. People are looking for more focused arrays, explains Peter Herzer, Ph.D., senior scientist, microarray applications. Theyre still using whole-genome arrays for overall screening, but then they find groups of genes they are interested in. Thats why were moving toward the 16-sample assays.
He adds that this uses a high-density matrix that accommodates 60,000 spots on one slide. This is scanned using commercially available scanners, and images generated can be easily analyzed using the CodeLink Expression Analysis software. The company recently launched an ADME rat-16 assay that uses this format.
The 3-D polyacrylamide matrix that coats the slide surface of the CodeLink bioarraays has active functional groups that bind to primary amines (oligonucleotides that are amine-modified).
You get covalent binding to the polyacrylamide matrix through the active binding sites. It has extremely high-binding capacity for oligos, so you can get more surface area. The 3-D structure is like a low-density spongeyou can get lots of binding throughout, Dr. Herzer explains.
This has several advantages, such as more signal in spots, no steric hindrance unlike 2-D slide surface, and high specificity, down to 1:2 million dilution at the cRNA level, enabling detection of RNA expressed at low levels in the cell.
Multiplexed PCR-based Platform
Primera Biosystems (www. primerabio. com) developed a new technology for the simultaneous quantitative detection of multiple targets in a single sample. STAR, or Scalable Transcriptional Analysis Routine, integrates RT-PCR and capillary electrophoresis, allowing the detection of gene transcripts in a multiplex format. Amplicon size is used as an identifier for each target.
We found that microarrays were not very quantitative, and to find out which results were real and which werent, we had to use TaqMan. We thought there had to be a better way, where we can look at more genes that will be more quantitative and give us better data, says Elizabeth Garcia, Ph.D., director of assay development.
Specificity of PCR amplification is due to appropriate primer choice and reaction conditions. The hardest part of this assay is choosing the set of 20 primers that work well togetherthere are primer-primer interactions, explains Dr. Garcia. We are developing tools to make that much easier and redesigning the primers ourselves.
Since capillary electrophoresis allows accurate size determination of nucleic acids from 50 to 1,000 bases with the precision of one base, assays can be done simultaneously for many targets. Quantification capabilities are maintained equal to those seen with RT-PCR.
Invitrogen (www.invitrogen.com) developed several new products for gene expression analysis, ranging from sample preparation to RNA amplification and labeling. The primary product is the SuperScript III reverse transcriptase, says Patrick N. Gilles, Ph.D., R&D manager. This is more thermo-stable than the SuperScript II, resulting in longer, full-length cDNA synthesis with all of the splice variants. It also provides more reproducible array data with higher correlation coefficients.
The company has several SuperScript III labeling kits that incorporate Alexa Flors that are spectrally equivalent to cye dyes, but more stable and resistant to photobleaching and ozone degredation. One kit for direct labeling has direct incorporation into cDNA SuperScript Plus for DNA microarrays. The other is an indirect labeling kit that incorporates two nucleotides that can be conjugated to the dyes.
The unique part about this is that it allows you to use three-color experiments because we have three Alexa Flors with minimal overlap. This allows for a drastic reduction in the amount of required sample to obtain differential expression analysis. Since the Alexa Flors are more stable, it gives a better correlation coefficient that results in fewer false positives, states Dr. Gilles.
SuperScript III is also applied in the companys new RNA amplification kit. Starting with as little as 100 ng of RNA, it allows more full-length initial synthesis, so subsequent amplification is full-length and contains all the genes, even low-copy genes.
There are two new sample prep kits for obtaining small RNAs. The RiboMinus Transcriptome Isolation kit removes large ribosomal RNA but keeps the smaller RNA that can be used for labeling regulatory RNA. ChargeSwitch system eliminates many amplification inhibitors found in traditional purification (i.e., silica gel) methods. Binding is based on pH differencesat an acidic pH it will bind, at slightly alkaline pH it will dissociate. This allows for the binding of the total RNA preparation, whereas small RNAs would be lost through standard silica filters, explains Dr. Gilles.
The MAQC Project is working to provide quality control tools to avoid procedural failures and develop guidelines for microarray data analysis by providing large reference datasets and accessible reference RNA samples.
Microarray Quality Control
Illumina (www.illumina.com) is one of the companies involved in this project and will be participating in a roundtable discussion about the MAQC effort this March in San Diego. We set up a series of controls for ourselves from the beginning that are built into the microarray system, explains Shawn C. Baker, Ph.D., scientific product manager, gene expression. These include controls for the manufacture of the arrays.
Every probe that we build has around 30 copies randomly distributed across the microarray surface. This provides a more precise measurement, and if one of them is out of whack, it automatically gets removed. Its much more robust.
This process is automated and occurs during the scanning process. Called BeadArray, the array contains thousands of tiny wells, each filled with three micron beads covered with hundreds of thousands of copies of single-stranded nucleotide sequence.
The scanner finds each bead, extracts its value, and builds a table of all the values. Measurement of the probe intensities is averaged by software that comes with the system. It can also perform normalization methods, differential expression algorithms, clustering or scatter plots, and can import the data into other analysis packages.
Dr. Baker explains that even though a company may create the highest quality microarray, variation still occurs. Were interested in having an understanding of that variation and trying to reduce it, so all you have left is biological variation.