Home Insights Novel Systems for Visualizing Biological Data

Novel Systems for Visualizing Biological Data

October 15, 2005

October 15, 2005 (Vol. 25, No. 18)

Kathy Liszewski

Enhancing the Ability to Collect, Share, and Analyze Crucial Information

Information overload is challenging an increasing number of scientists attempting to glean relevant information following data collection. Results generated by combinatorial chemistry and high throughput screenings are escalating the numbers of candidate compounds scientists must evaluate.

To meet the need for improved visualization of biological data, companies are offering a variety of updated approaches that could enhance the ability to rapidly collect, share, and analyze crucial information.

Rapid Visualization and Sharing

Calculating and making sense of huge volumes of data generated by library screenings for discovery of drug candidates can be daunting. “Previously, if one parameter were changed, you would have to recalculate everything to see all the relevant information in one big picture,” notes Jean Patterson, Ph.D., chemistry investigator at ArQule (www.arqule.com). The company has developed an improved strategy for data visualization that can be rapidly shared among its scientists.

According to Dr. Patterson, “We integrated Spotfire’s (www.spotfire.com) DecisionSite software into our proprietary MapMaker software. MapMaker is a web-based tool for diversity-oriented design of parallel synthesis libraries. Spotfire’s DecisionSite accelerates analysis and reporting through its plots and query devices that allow scientists to interactively ask questions of their data.”

“Integrating both into one program allowed us to merge a number of complex tools, hide that complexity, and, therefore, simplify the process. This allows synthetic chemists to directly do library design and processing, rather than having to rely on computational or informatics experts,” Dr. Patterson continues.

“MapMaker automatically loads the relevant reagent and product data into the Spotfire software to rapidly visualize properties of the designed library. The result is a standardized file containing relevant information about structures, properties, and histograms.

“Chemists can interactively assess the impact of modifying reagents on the library design and ensure that they produce high-quality, drug-like products. We can share this file conveniently and easily with collaborators or among our staff. So overall, we can greatly assist medicinal chemists in the evaluation and selection of library designs and data exchange.”

William Boni, vp investor relations and corporate communications at ArQule, points out that applications for its software include both internal and external programs. “The mission of ArQule is to develop cancer therapeutics. We are leveraging many of the same chemistry capabilities that are used to meet the needs of our clients to significantly enhance our internal oncology drug discovery and development programs.”

Integrating Data for Efficiency

Bayer Healthcare (www.bayer.com) developed a strategy for integrating data gathering and analysis, to make lead identification and optimization more efficient, according to Michael Haerter, Ph.D., principal research scientist.

“A tremendous amount of information is continuously generated by research and discovery efforts within pharmaceutical organizations. There are many disciplines involved in pharma R&D and, in the past, each often had its own database generated as a stand alone.

“This prevented efficient exchange of information among scientists since the databases were not integrated for everyone’s use. Typically, a scientist needed to navigate through four to five different company databases, each of which had a different look and feel. Doing this risked losing or missing important data.”

Dr. Haerter says that Bayer’s breakthrough came by merging all corporate data from its former stand-alone databases into one data warehouse. “We decided to develop an integrated approach that would support our company’s wide-ranging information technology needs. We collaborated with Lion Biosciences (www.lionbioscience.com) and Tripos (www.tripos.com) to produce an innovative approach called the PIx system.

“This client-server application allows our scientists to better integrate data retrieval, visualization, analysis, and workflow support tools. Also, this same application provides a chemically intelligent’ spreadsheet as the focal point for data visualization and analysis.”

Another improvement is availability of computing engines that can utilize retrieved data to find critical correlations among the pharmacologic, pharmacokinetic, physicochemical, and other properties and chemical structures.

“This is a powerful tool for the de novo design of compounds,” Dr. Haerter believes. An added advantage is our ability to link workflow and decision-support tools to facilitate information sharing and storage within our entire organization.”

Currently, the company is utilizing the PIx system at its research and development sites in Germany and the U.S. LION Bioscience owns commercial rights to the product.

Cell-by-Cell Analysis

High content screening (HCS) provides volumes of multiple-parameter cellular data. This enables the use of a broad spectrum of cell-based assays for drug discovery. “The problem is that the nature and the size of high-content data demands new paradigms for data management and data mining,” comments Michael Sjaastad, Ph.D., director at Molecular Devices (www.moleculardevices.com).

“Nearly all HCS instruments collect adequate images and have the ability to generate high-quality data with the provided algorithms. What’s needed is an HCS imaging platform with the ability to look at and write into a database seamlessly. More importantly, one needs to be able to integrate data irrespective of the instruments used to study a target. A nice surprise has been the applicability of microarray data informatics strategies for analyzing high-content data.”

“We’ve retooled microarray informatics tools to create AcuityXpress to look at cell-based data,” Dr. Sjaastad notes. “We’ve also packaged it to easily evaluate multiple parameter HCS data sets. The strategy is similar to analyzing microarrays in the sense of clustering and profiling algorithms, and seeing important trends. We’ve used it to pull out data sets from multiple instruments, to group data, and to reference compounds.”

The company recently verified its approach by assessing the utility of AcuityXpress to analyze screening data for agonists or antagonists of the beta-2-adrenergic receptor with the LOPAC library. “We used our Transfluor high content assay for G-protein coupled receptor (GPCR) activation. We collected and analyzed images, and showed that the hits we identified were consistent with the expected biology of the receptor. In addition, applying data clustering approaches for visualization revealed new findings in old datasets. This shows the value of thinking differently as we forge forward in HCS,” Dr. Sjaastad says.

One Click Analysis

Mark Collins, Ph.D., product manager at Cellomics (www.cellomics.com), suggests that client input can be a valuable source for determining how to improve products. “Feedback from our clients doing high-content screening indicates that they need improvements in ease-of-use considerations, speed, and robustness of data management, and also better multi-user interfaces.”

Seeking to do just that, the company recently announced a new software update for its HCS instruments called HCi 2005. “A large part of this release, addresses data visualization, and was driven by comments from our large customer base,” Dr. Collins notes.

“Scientists want to more easily move from visualization to analysis, and back again. With our new update, you can essentially click on cells and see all the measurements made on one cell or a large population of cells. So, you can more easily, for example, see if a drug had a particular effect on a cell population or if a protein got expressed.”

Dr. Collins indicates that more easily reconstructing such cellular population measures is helpful for assessing the effect of targeting or predicting toxicity of compounds. “Being able to perform live cell kinetic measurements is especially important. It provides real-time visualization that allows the investigator to assess multiple events in cells such as phenotype and morphology.”

Capturing Data

Many scientists are seeking ways to improve data access and security prior to performing visualization studies. CambridgeSoft (www.cambridgesoft.com) offers its BioAssay Enterprise 9.0, which is designed for scientists performing complex lead optimization studies, who want to capture data from multiple sources, upload it to a central secure location, and perform multiple types of analyses.

According to Louis Culot, vp informatics, “BioAssay is designed to fill the niche Excel cannot. There are significant limitations in Excel such as an inability to consolidate data and limited calculating capabilities. Because BioAssay is scalable, scientists can load large amounts of assay data quickly and easily to extract meaning from millions of data points.

“Once the data has been captured, scientists can perform a number of operations such as creating custom plots and curve-fitting processes to select the best data. An added advantage is that anytime after collection, scientists can go back to their generated sets, and further refine them with different parameters.”

Culot says that users also can import data into BioAssay from a wide variety of instruments, Excel sheets, and Oracle. “BioAssay Enterprise has great flexibility as well as a strong focus on visualizing biological data. It can be used for anything from in vivo to high throughput work.”

After scientists are satisfied with data validity they can incorporate it into BioSAR Enterprise for further validation and publication to the world. BioSAR Enterprise is a web-based interface designed to access data quickly and easily, in order to create customized structure-activity reports.

Predicting Toxicity

Chemically induced toxicity continues to be one of the major concerns of pharmaceutical, agricultural, and other chemical industries. New analytical and visualization techniques that uncover toxicity information are aiming to minimize the late-stage attrition of drug candidates.

Chihae Yang, vp toxicology and predictive modeling at Leadscope (www.leadscope.com), says, “The inability to accurately predict safety liabilities for compounds, costs the industry billions of dollars each year. So scientists need to screen for potential toxicity issues from the very beginning.”

According to Dr. Yang, this is no simple issue. “Large amounts of information are available on public databases. But these are not optimized for building structure-toxicity relationships. Often scientists are quite frustrated with attempting to predict a compound’s safety from these highly fragmented data. We are working with the FDA to accumulate publicly available databases on predictive toxicology.”

Dr. Yang notes that some of these data are quite scattered and distributed as paper files, microfiche, and nonstandardized toxicology memoranda. “We are organizing this nonproprietary data into a logical, searchable format that will reduce paperwork, expedite predictive toxicology studies, and provide valuable information that is currently only available to the FDA.

“Additionally, we will be incorporating chemical structures with genetic toxicity information. This will help identify biologically active moieties in order to develop QSAR modeling and testing guidelines. Leadscope and the FDA are excited to get this data out to the public. We will distribute the database and share revenues with the FDA. The databases will cost users $250/year.”

Leadscope also is developing the Prediction Model Builder. This will tap into its chemoinformatics software and break down compounds into substructure fragments. “This software has open-based prediction capability for visualizing specific areas of molecules. This helps scientists determine domains of applicability and see the areas of chemical space where more data is needed to refine their models.”