November 15, 2010 (Vol. 30, No. 20)
Richard A. A. Stein M.D., Ph.D.
Which Tool Is Best Depends on the Question Asked and the System Being Surveyed
Recent decades have marked exciting times for biological sciences. Since the 1977 milestone publication of the first genome, that of the single-stranded bacteriophage ΦX174, new methodologies that culminated with the implementation of next- and next-next generation sequencing platforms and the advent of -omics sciences facilitated a level of scrutiny that previously seemed beyond imagination.
While techniques available in the past allowed only a limited number of genes to be examined at one time, genome-based microarray platforms opened the possibility to analyze entire cellular transcriptomes under specific sets of conditions. This global approach enables the visualization of discrete pathways or groups of genes, which can be surveyed under various conditions, such as a specific disease or treatment with a certain therapeutic agent.
“Microarrays have become a very useful and important tool for this purpose,” explains Stephen Walker, Ph.D., assistant professor at Wake Forest University School of Medicine. Since 2005, Dr. Walker has been a member of the international microarray quality control consortium (MAQC), an initiative established by the FDA with the participation of all of the leading microarray manufacturers, end users, and bioinformaticians.
MAQC emerged in the wake of several publications that had reported on cross-platform comparisons of microarray performance and revealed that microarray analyses of the same biological samples examined by different laboratories using different platforms often lead to very different results.
“This was initially shocking for the microarray community, since arriving at different sets of answers would imply that many microarray experiments are not reproducible,” recalls Dr. Walker. To address this issue, the consortium compared microarray platforms from all of the major microarray providers and, in a series of papers published in Nature Biotechnology in 2006, concluded that, overall, the various platforms performed well and also comparably to each other.
“The biggest take-home message was that the genome is a moving target. When different array manufacturers used to refer to a specific target for a specific gene, they each may well have been using different sequences within the same gene, and this helps explain why different results were sometimes obtained,” explains Dr. Walker.
Despite their advantages, microarrays have several limitations. “Microarrays are a great tool to start with, but to understand the biology of the system one needs to subsequently conduct more focused experiments,” advises Dr. Walker. The recent push toward RNA sequencing is being driven not only by the need to learn about changes in the expression of a particular gene, but also by the opportunity to visualize the differential regulation of multiple splice variants for one specific gene.
“By sequencing the transcriptome, rather than labeling RNA and hybridizing on microarrays, there is a higher likelihood to pick up individual splice variants and get a much more comprehensive gene-expression profile,” explains Dr. Walker. “This provides a more comprehensive analysis but adds, at the same time, another layer of complexity, because the size of the files is quite large, and the approach, at the present time, can be cost prohibitive,” he adds.
Ultimately, whether data is obtained by microarrays or by sequencing, one of the major questions is how to examine the huge amount of information that is generated. “Once investigators obtain the dataset, it is very exciting, but the challenge is what to do with it next, and how to find the right tools for analysis. There is currently a bottleneck with the availability of appropriate software tools needed to keep up with the ever larger and more detailed datasets,” notes Dr. Walker.
One of the challenges in analyzing data obtained by microarray and sequencing technologies is to integrate vast amounts of information generated by various experiments and different groups. In 2004, Philip Zimmermann, Ph.D., lecturer and group leader at ETH Zurich, and co-founder of Nebion, a spin-off company from ETH Zurich, together with several colleagues, built an online tool that became known as Genevestigator .
This application, which integrates Affymetrix’ GeneChip microrray data from experiments conducted in many laboratories and deposited in the public domain, was generated by curating large amounts of public data, making it very standardized, very comparable, and developing algorithms for analysis.
A more advanced version of Genevestigator became available in 2006, and currently has over 20,000 registered users. “I believe that, for microarrays and next-generation sequencing, the added value is going to be much bigger if we look at data in context of hundreds or thousands of other experiments,” explains Dr. Zimmermann. Genevestigator was validated for several experiments that sought to identify key genes involved in specific biological processes or conditions, or to identify conditions that affect a certain gene or set of genes.
Repetitive regions cannot be interrogated by existing microarray platforms and, as a result, their genome-wide profiles have remained relatively unstudied, and these regions continue to represent one of the most challenging and little-known parts of the genome.
“With next-generation sequencing, repetitive regions can now be surveyed to a large extent,” explains Peter J. Park, Ph.D., assistant professor at Harvard Medical School. Dr. Park and collaborators are using next-generation sequencing to compare the genome from diseased individuals with the genome from healthy individuals, and this approach promises to identify repetitive region changes that are associated with malignancy.
This endeavor is catalyzed by the fact that whole-genome sequencing has become much more affordable in recent years. “Sequencing a genome used to cost millions of dollars three to four years ago, and it can be done for $10,000 right now,” remarks Dr. Park. “There will be a lot of interest in this field in the next few years, because we know very little about these repetitive sequences, but there are diseases where changes in these regions cause undesirable phenotypes,” he explains.
“The most important question is what you are looking for,” explains Francis Galibert, Ph.D., emeritus professor at the University of Rennes and senior scientist at the Centre National de la Recherche Scientifique. Dr. Galibert and collaborators are using both microarrays and deep sequencing, and noticed that sequencing might be better to survey genes that are poorly expressed, particularly when gene-expression changes between two distinct biological conditions that are examined are small.
“But for this to happen you need very, very deep RNA sequencing, so that the genes you are interested in are counted enough times. And this is better achieved if you performed 3´ Tag sequencing instead of RNA full-length sequencing,” explains Dr. Galibert. One of the projects in the Galibert lab examines RNA expression changes in the rat olfactory bulb and olfactory epithelia over time, and also responses to specific environmental exposures.
In rats exposed to different odorants, the investigators noticed that specific odorants modify the number of transcripts corresponding to the receptor to which they bind. “But the changes are very subtle and quite difficult to detect, and one needs a large number of experiments that have to be repeated a few times. And for this particular application, I thought that sequencing might be more accurate,” explains Dr. Galibert.
“Next-generation sequencing has several advantages if properly implemented and, among other things, it is more accurate and more comprehensive than oligonucleotide hybridization-based sequencing,” says Heidi L. Rehm, Ph.D., assistant professor of pathology at Harvard Medical School and chief laboratory director at the Laboratory for Molecular Medicine at Partners Healthcare Center for Personalized Genetic Medicine.
Approximately three years ago, the Laboratory for Molecular Medicine launched one of the first hybridization-based sequencing services intended for clinical use. One of the shortcomings of hybridization approaches is their difficulty to detect insertions and deletions, and their inability to visualize copy-number variations.
Next-generation sequencing can reveal these types of changes, and Dr. Rehm and collaborators are currently developing next-generation sequencing-based applications for use in the clinical lab—but these, too, are anticipated to bring their own set of specific challenges.
“There are so many ways to conduct next-generation sequencing, in terms of protocols, instruments, libraries, barcodes, and all the different capture methods, and these all represent essential aspects in terms of what to get out of the test,” explains Dr. Rehm. “It is a very complex system to develop, and it requires a huge amount of hardware for data storage and bioinformatics support,” she adds.
“We use microarrays for detecting microorganisms, and also for measuring their metabolic activities,” explains Michael Wagner, Ph.D., head of the department of microbial ecology at the University of Vienna, Austria. Dr. Wagner and colleagues are using microarrays to examine ribosomal RNA genes and simultaneously detect multiple microorganisms from complex clinical or environmental samples. This approach has important implications for diagnostics and environmental monitoring.
“This kind of application can also be done by next-generation sequencing, which is less targeted and much more expensive, but microarrays are faster and cheaper when the goal is to detect previously recognized microorganisms. It all depends on the type of questions that one wants to address,” emphasizes Dr. Wagner.
In addition to detecting microorganisms, this approach also opens the possibility to obtain information about the in situ metabolic activity of the microorganisms that are being surveyed.
The tool, which is called isotope array and was developed by Dr. Wagner and colleagues several years ago, involves adding an isotope-labeled substrate to the sample, and examining the isotope incorporation into the ribosomal RNA by active microorganisms. Microarray analysis can subsequently be informative not only with respect to the identity of the microorganisms that are present, but also about their metabolic activity. This approach is even more valuable, considering that most microorganisms cannot be cultured in the lab.
“If the goal is to conduct highly parallel functional studies, and learn not only which microorganisms are there but also what they are doing, then microarrays cannot be replaced. Both microarray platforms and sequencing methodologies have their specific niches, where they complement each other,” emphasizes Dr. Wagner.
Microarrays and sequencing have emerged as powerful tools that are revolutionizing the life sciences and continue to promise new perspectives in research and medicine. Each presents certain advantages and opens a number of challenges. It is the need to implement these tools in a manner that is most effective, most informative, and carefully tailored to the scientific question and the biological system that is being surveyed that emerges as the most memorable take-home lesson.