Recent decades have marked exciting times for biological sciences. Since the 1977 milestone publication of the first genome, that of the single-stranded bacteriophage ΦX174, new methodologies that culminated with the implementation of next- and next-next generation sequencing platforms and the advent of -omics sciences facilitated a level of scrutiny that previously seemed beyond imagination.
While techniques available in the past allowed only a limited number of genes to be examined at one time, genome-based microarray platforms opened the possibility to analyze entire cellular transcriptomes under specific sets of conditions. This global approach enables the visualization of discrete pathways or groups of genes, which can be surveyed under various conditions, such as a specific disease or treatment with a certain therapeutic agent.
“Microarrays have become a very useful and important tool for this purpose,” explains Stephen Walker, Ph.D., assistant professor at Wake Forest University School of Medicine. Since 2005, Dr. Walker has been a member of the international microarray quality control consortium (MAQC), an initiative established by the FDA with the participation of all of the leading microarray manufacturers, end users, and bioinformaticians.
MAQC emerged in the wake of several publications that had reported on cross-platform comparisons of microarray performance and revealed that microarray analyses of the same biological samples examined by different laboratories using different platforms often lead to very different results.
“This was initially shocking for the microarray community, since arriving at different sets of answers would imply that many microarray experiments are not reproducible,” recalls Dr. Walker. To address this issue, the consortium compared microarray platforms from all of the major microarray providers and, in a series of papers published in Nature Biotechnology in 2006, concluded that, overall, the various platforms performed well and also comparably to each other.
“The biggest take-home message was that the genome is a moving target. When different array manufacturers used to refer to a specific target for a specific gene, they each may well have been using different sequences within the same gene, and this helps explain why different results were sometimes obtained,” explains Dr. Walker.
Despite their advantages, microarrays have several limitations. “Microarrays are a great tool to start with, but to understand the biology of the system one needs to subsequently conduct more focused experiments,” advises Dr. Walker. The recent push toward RNA sequencing is being driven not only by the need to learn about changes in the expression of a particular gene, but also by the opportunity to visualize the differential regulation of multiple splice variants for one specific gene.
“By sequencing the transcriptome, rather than labeling RNA and hybridizing on microarrays, there is a higher likelihood to pick up individual splice variants and get a much more comprehensive gene-expression profile,” explains Dr. Walker. “This provides a more comprehensive analysis but adds, at the same time, another layer of complexity, because the size of the files is quite large, and the approach, at the present time, can be cost prohibitive,” he adds.
Ultimately, whether data is obtained by microarrays or by sequencing, one of the major questions is how to examine the huge amount of information that is generated. “Once investigators obtain the dataset, it is very exciting, but the challenge is what to do with it next, and how to find the right tools for analysis. There is currently a bottleneck with the availability of appropriate software tools needed to keep up with the ever larger and more detailed datasets,” notes Dr. Walker.
One of the challenges in analyzing data obtained by microarray and sequencing technologies is to integrate vast amounts of information generated by various experiments and different groups. In 2004, Philip Zimmermann, Ph.D., lecturer and group leader at ETH Zurich, and co-founder of Nebion, a spin-off company from ETH Zurich, together with several colleagues, built an online tool that became known as Genevestigator .
This application, which integrates Affymetrix’ GeneChip microrray data from experiments conducted in many laboratories and deposited in the public domain, was generated by curating large amounts of public data, making it very standardized, very comparable, and developing algorithms for analysis.
A more advanced version of Genevestigator became available in 2006, and currently has over 20,000 registered users. “I believe that, for microarrays and next-generation sequencing, the added value is going to be much bigger if we look at data in context of hundreds or thousands of other experiments,” explains Dr. Zimmermann. Genevestigator was validated for several experiments that sought to identify key genes involved in specific biological processes or conditions, or to identify conditions that affect a certain gene or set of genes.
Repetitive regions cannot be interrogated by existing microarray platforms and, as a result, their genome-wide profiles have remained relatively unstudied, and these regions continue to represent one of the most challenging and little-known parts of the genome.