October 1, 2006 (Vol. 26, No. 17)
Modeling & Analysis of Biological Pathways and Integration of Data Sources and Types
Rapid advances in life sciences technology have provided researchers with vast amounts of data. Now, the challenge is to analyze and visualize the data to increase its accessibility. Many companies are meeting this challenge with new software solutions.
Systems biology is a rapidly growing field that requires both computational and graphical analysis. The MathWorks (www.mathworks.com) recently launched SimBiology™, which enables researchers to simulate, model, and analyze biochemical pathways in one platform. Based on the company’s Matlab® programming language, SimBiology reportedly eliminates the need for specific tools at each phase of systems biology.
Pathways Analysis
“We started noticing, about five years ago, that a lot of people in pharmaceutical companies were buying Matlab and using it for dynamic simulations of pathways,” says Rick Paxson, manager, systems biology group. “We attempted to educate them on SimuLink®, which wasn’t really geared toward biologists. That’s why we built SimBiology.”
The software addresses a major hurdle —a lack of graphical language to represent pathways. It enables researchers to simulate the modeled reactions and then analyze the resulting data or perform custom analysis with Matlab.
“When you go on the net and search for how some companies might be using pathways to represent biological systems, they use different symbols. We’ve developed a minimal language that does that job, and we will continue to improve it in future versions,” adds Paxson.
SimBiology further helps focus on potential drug targets within pathways with its automation of sensitivity analysis. Its parameter estimation functionality also lets users generate estimates for unknown parameters within an existing model.
In addition, SimBiology allows loading of data file formats from different platforms (e.g., Affymetrix; www.affymetrix.com) directly into Matlab and allows the user to annotate models with notes from literature or other sources.
Ariadne Genomics (www.ariadnegenomics.com) licensed its Pathway Studio® software to Murex Pharmaceuticals to develop computer-based models for the identification and validation of cancer targets.
“When we were working on a gene expression project, we realized that the next step to microarray analysis is via the analysis of pathways,” states Ilya Mazo, Ph.D., company president. “People need to understand the biology behind why certain genes are differentially regulated between normal and cancer states. That’s where Pathway Studio comes in.”
This software helps interpret experimental results in terms of pathways, gene regulation networks, and protein-interaction maps and automatically extracts information from scientific literature. It also reconstructs pathways from the user’s microarray and other data.
After inputting a set of genes and/or proteins or a microarray experiment to initiate database mining, the software retrieves the most relevant networks that are differentially altered in a disease or provides information about common regulatory mechanisms of the gene set. The found networks are displayed and can be validated by referring back to the original article where the facts originated. New networks can be further analyzed by comparing them to known pathways.
The MedScan™ technology, which extracts information from PubMed and 43 full-text journals, is a unique feature to this software, according to Dr. Mazo. This is done using the company’s NLP (Natural Language Processing) tool. “We’re not trying to tell people which data is good or bad. We give them a tool so they can find the data that they need,” states Dr. Mazo.
Custom Maps
GeneGo (www.genego.com) uses its software and databases to generate maps of pathways relevant to drug discovery and development. MetaCore™ contains information on human biological pathways, based on experimental literature, manually curated by company scientists.
“Initially we developed these maps and now we’ve created a tool called map editor where people can develop their own maps from scratch or take a network and convert them,” explains Julie Bryant, vp, business development and sales.
The company licensed 225 of these original maps of human cell signaling and metabolic pathways to Invitrogen (www.invitrogen.com), which now uses them as a storefront window. Invitrogen’s iPath™ is free of charge and allows customers to map their products, so they can see all of Invitrogen’s products for a certain pathway rather than going through a catalog. “It’s a unique way of selling products while giving something back with this high-level information,” says Bryant.
Customers can get information on genes and proteins, but if they want to use an interactive tool, they have to license MetaCore, which currently includes 480 maps. The software also allows the user to import various types of data (e.g., gene expression and proteomics) and overlay it on top of the maps.
Metabolomics Solutions
GeneGo’s other software application, MetaDrug™, predicts metabolites and interactions for a particular drug structure involved in toxicity. It also incorporates experimental measurements of metabolites to visualize preclinical and clinical data in the context of a complete biological system.
Bio-Rad Laboratories (www.bio-rad.com) responded to the need for an end-to-end metabolomics platform with the KnowItAll® Informatics System, Metabolomics Edition. This integrates Infometrix’ (www.infometrix.comPirouette® chemometrics technology, a tool for multivariate data analysis. It also includes the company’s new Overlap Density Heatmap technology for comparative spectral visualization and a database of NMR spectra of common metabolites with Internet links to the KEGG database (Kyoto Encyclopedia of Genes and Genomes).
“You can start with the raw NMR data, do the processing in batch mode, bring it into a database, and transfer within KnowItAll to the application we created using Pirouette, called AnalyzeIt™ MVP,” says Gregory Banik, Ph.D., general manager of informatics. This allows the user to do principal component analysis, look at the data in a more concise fashion to determine if any peaks may be diagnostic, and simultaneously search a database of known metabolites.
The IntelliBucket™ is a new feature that allows the user to do binning and bucketing in one environment. Binning is chopping up NMR data into 225 equal segments. “Some work has suggested that some variable bin width is slightly better,” adds Dr. Banik. IntelliBucket allows the bin widths to be determined based on the Overlap Density Heatmap and looks at what’s similar or dissimilar in the data set (indicates areas of highest spectra similarity in red and the lowest similarity in violet).
Using Mass Spec Data
Agilent Technologies’ (www.agilent.com) latest addition to its GeneSpring Analysis Platform is the GeneSpring MS software. It helps determine biomarker function, identify biological pathways involved, and infer specific molecular interactions from mass spectrometry data.
“If you run a dozen Agilent whole-genome arrays, you’ve collected what amounts to almost a half-million data points,” notes Jordan Stockton, Ph.D., marketing manager, bioinformatics at Agilent Technologies. “Somewhere in that ocean of data points lies a pattern that is telling you something biologically relevant.”
Using the GeneSpring MS software, data can be imported from GC/MS and LC/MS sources and analyzed using several statistical tools. It also enables subsequent MS/MS-based analyses, such as protein and metabolite identification, peptide/protein confirmation, and structural characterization.
Another software application similar to GeneSpring is the company’s CGH Analytics Software. “This takes a similar approach to solve problems where changes in copy numbers are potential markers,” says Dr. Stockton. The CGH software includes filtering mechanisms, a graphical report generator for summary views of aberrations, and access to websites to search for more information. It also saves a list of gene(s) present in a specific aberration for further analysis.
The user can explore the interaction between gene copy number and expression levels measured by Agilent microarrays with CGH Analytics Software, which performs joint analysis of CGH and gene -xpression data.
“We’re in the process of coupling all our tools from copy number analysis to the GeneSpring suite,” adds Dr. Stockton.
Working with Various Data Sources
Along with large amounts of data in life sciences research comes the problem of a variety of formats. OmniViz (www.omniviz.com) hence developed software that allows researchers to assimilate large volumes of various data into the same framework. “Data can be in various forms—numeric (gene-expression data) or categorical data or a mixture of numeric and categorical, such as clinical trial data,” explains Jeffrey Saffer, Ph.D., president and CEO. “We are unique in this field, being the only ones who can work these different sources of information simultaneously.”
Omniviz Pathway Enterprise™ provides access to biological pathway information and enables the storage of any pathway content. This single interface can be used for searching and adding annotations and other reference information to be cross-referenced in a biological manner. It is based on a Biologic Object Model that allows standardized querying and mapping of clones to genes and then proteins.
The software also enables the user to draw and share new pathways, provides standardized gene and protein names to avoid confusion, integrates with either the Omniviz analysis package or other imported files allowing the coloring and play-through animation of values associated with pathway members, prints pathways for publication, and expands from individual proteins/compounds/genes, based on known connections within the database.
“One of the biggest issues when attempting to put many types of different information together is dealing with different formats,” says Dr. Saffer. “We decided that we had to remain data agnostic so we built several subsystems of our software to deal with essentially any data format. We haven’t seen other companies make the attempt to integrate all the different sources of data, which is the discovery process we need to go through.”