January 15, 2005 (Vol. 25, No. 2)

Integrating Data into Accurate Models

Rapid acquisition and analysis of high volumes of data in biological samples had its advent in the early days of the Human Genome Sequencing Project. For higher layers of genetic information (RNA and protein), microarray technology has facilitated the interrogation of large numbers of samples for biologically relevant patterns in a variety of physiological, drug-induced, or clinically relevant cellular states.

The challenge which now presents itself is how to integrate these large volumes of information into an accurate model of cellular behavior and processes. Information regarding the effect of a drug on the extent and duration of apoptosis in cancer cells, for example, would be invaluable information in a screen for cancer drugs.

Similarly, information on cytoskeletal changes leading to motility invasiveness would greatly streamline the development of an efficient anti-angiogenic pharmaceutical strategy.

Meeting these and other demands in both the academic and pharmaceutical research communities is the emerging discipline of cytomics, and the attendant technology known as high content screening.

“High content screening is loosely defined as a simultaneous, or very close in timeframe, multiparametric analysis of various aspects of a cell,” says Todd Neville, technical solutions senior scientist at IBM Life Sciences in Phoenix.

The importance of this field for the future of drug screening can be appreciated by the fact that the Cambridge Healthtech Institute (CHI) is devoting an entire conference to this area, the “High Content Analysis Conference,” to be held in San Francisco, January 29-30.

High content screening (HCS) conceptualizes the cell as the ultimate functional end point, or unit, for any biological stimulus. “The term was probably originally derived to differentiate assays that used live cells or resulted in the measurement of multiple variables from the more traditional single data point readouts of HTS assays that were often based in biochemistry or ligand binding,” says Michael Sjaastad, director of business development at Molecular Devices (Sunnyvale, CA).

HCS couples cell-based assays with robotics, automated image capture, advanced image analysis, and informatics to provide richly detailed information on cell morphology and other responses in large quantities. Time also becomes a factor in a HCS platform. “Conducting experiments over time increases the data content and context,” Sjaastad observes.

Many protocols for generating data are well developed in their respective disciplines, from quantitative PCR, to flow cytometry, to antibody staining. Moreover, the methods for acquisition of this data, such as the various flavors of microscopy, have already undergone extensive development.

Coming to the fore now in terms of importance are strategies for storage and software-based analysis of the data, companies are approaching this problem from both ends of the spectrum, some from the pharmaceutical standpoint, others with well-established IT credentials who see this area as a major opportunity.


Perhaps the most important image acquisition methods in HCS relate to cellular imaging, including drug effect assays, cytotoxicity, apoptosis, cell proliferation, and nucleocytoplasmic transport.

“With imaging one can simultaneously capture multiple measurements from individual cells including molecular colocalization, metabolic state, motility, cell cycle, texture, and cell morphology” says Judy Masucci, Ph.D., director of marketing and sales support at Cellomics (Pittsburgh, PA).

According to Dr. Masucci, no other single sensor modality can provide a comparable depth of information. Next-generation image acquisition instruments feature multispectral imaging, permanent, accurate alignment, and integration with downstream image processing and analysis software packages.

“HCS brings the mature field of cell biology to our assays, and imaging technology into the drug discovery environment,” says Sjaastad. “Image-based screening is maturing quickly enough that primary screens can now run with imagers.”

Advances in image acquisition have made certain types of screens more routine, including phenotypic and morphological. Phenotypic assays measure the ability of a virus, such as HIV, to replicate in the presence of a specific drug, providing a direct measurement of drug susceptibility.

A talk at the CHI meeting by Berta Strulovici, executive director, automated biotechnology at Merck (Whitehouse Station, NJ), will focus on two case studies of phenotypic assays that highlight the utility of these assays in a high content screening context.

According to Neville, another speaker at the forthcoming conference, two major components make up the IT infrastructure in high content screening: computation and storage.

“The most common technical logistical challenge that IT architects face is keeping track of the millions of files that high content instruments create, keeping them safe, backed up, and stored on the most cost efficient media,” he says.

The numbers involved are daunting. “A typical screen can generate over 10,000 high-resolution images a dayhundreds of gigabytes of data,” Sjaastad points out.

On the computational side, many techniques require complex algorithms to be run over these large amounts of data emerging from the instrument, and selection of a computational platform requires consideration of variables such as CPU selection and memory.

Neville points out that federal regulations often stipulate the length of time that records must be kept in storage. Another issue is access to the datahow much should be kept in long-term vs. short-term storage. Some data types, such as mass spectrometry, typically move into long term storage on tape in 23 years.

Conversely, says Neville, results files created from the analysis of a raw file or an image may be kept on high speed disk for a longer period of time, since this is what the investigator typically works with.

A central consideration in developing any HCS strategy is therefore defining an information life cycledefining what data goes where and for how long. Once this has been established, scalable strategies for archiving and backup can be defined.

In addition to the physical problems associated with data storage, is the question of the appropriate software for managing and querying the data. Perhaps the best known model in this context is the Laboratory Information Management System, or LIMS. These custom software frameworks allow for metadata to be associated with specific datasets using consistent annotation.

“This allows you to group your data logically and perform queries on it, i.e., show me all the files in our systems associated with trials of a specific drug,'” says Neville. He points out a third issue, data exchanges, which is often a problem for larger companies requiring solutions for moving data between different data storage centers.

“This usually entails the integration of various data formats and a web portal on the front end, allowing researchers in different organizational units to share data through a common interface,” he points out.

Analysis and Interpretation

Perhaps the most critical component of a HCS platform lies in cellular analysis. “The collective imaging expertise of Universal Imaging and Axon Instruments are being leveraged at Molecular Devices to provide a total solution for HCS imaging. Starting in 2005, new software will control all of our available imaging systems and will integrate with our database capabilities,” says Sjaastad.

The pace of software development in this area has been rapid, to the extent that, as Sjaastad points out, “You no longer need a dedicated cellular image analysis expert or IT group to manage screening with imaging systems.”

Cellomics developed a software platform which integrates advanced image analysis with informatic solutions. “Our high content informatics (HCi) products provide tools for visualization and analysis as well as data management, which provides instant access to data for decision making,” says Dr. Masucci.

HCi’s architecture is based around three layers: a presentation tier, a middleware tier, and a data tier. The data tier contains Cellomics’ HCS database, Cellomics Store, which can be deployed in a variety of commercial relational database management systems.

A middleware layer acts as a communication interface between the database and the presentation tier, providing flexibility for the incorporation of additional third-party modules if required. The presentation tier contains tools for controlling image acquisition and analysis.

An image analysis module that has been recently added to the Cellomics platform is Spot Detector BioApplication. “BioApplications feature quantitative image analysis algorithms that are biologically validated to provide answers to broad ranges of biological question, thereby reducing assay development time,” says Masucci.

BioApplications is capable of analyzing mitotic index, cell proliferation, and viability, as well as receptor internalization. It is comprised of a number of different modules, such as Morphology Explorer, which quantifies the size, shape, and distribution of cells, and Kinetic Molecular Translocation, an application which performs kinetic analysis of nuclear-cytoplasmic transport.

Once data from a high content screen is analyzed, the challenge is to interpret the patterns that the analysis software generates, and to place them in context with other analyses. This is facilitated if other data have been acquired in a consistent manner on the same platform, but scientists wishing to compare data from the legacy literature and public databases with the trends they observe in a high content screen face considerable challenges.

Data Mines

Companies such as Cellomics and Ingenuity (Mountain View, CA) have devoted considerable resources to developing data mines that annotate and integrate biological observations from a wide variety of backgrounds.

Cellomics’ CellSpace Knowledge Miner is built around a graphic user interface that enables the user to search over 57,000 biological terms, 200,000 synonyms, and 2.2 million references, visually testing and ranking relationships and linking to external databases.

By showing a user all the molecules associated with apoptosis in the database, for example, CellSpace could complement a high content screen that identifies apoptosis as one of the end point properties of a group of candidate drug leads.

Nick Thomas, Ph.D., principal scientist for GE Healthcare BioSciences (www. gehealthcare.com) points out that the primary advantage of high throughput subcellular imaging is an increased analytical resolution that translates into an increased biological resolution.

“It is the only technique that provides both intensity and spatial information at the sub-cellular level,” he says. “The ability to follow the location, timing, and interdependence of biological events within cells in a culture is a unique feature of imaging.”

Combining imaging with cellular sensors based on fluorescent proteins and dyes provides researchers with the ability to screen drugs at a much higher biological resolution, allowing them to ask and answer more complex biological questions, he continues.

The development of siRNA has added a versatile component to the cellular imaging and analysis toolbox for drug development. Use of siRNAs in cellular imaging assays during target identification and validation is a powerful approach to investigating gene and protein function, says Dr. Thomas.

“In drug screening, particularly in secondary screening and lead profiling, siRNAs directed against signaling pathway proteins allow the modes of action of drug candidates to be analyzed with increased specificity,” he explains.

Interplay between the cell cycle and other biological processes means that virtually any area of drug discovery must take into account the possibility that drug target expression or activity is cell cycle dependent.

“To provide more powerful tools for cell cycle analysis we have developed a range of dynamic cell cycle sensors based on fluorescent protein expression and localization controlled by well characterized cell cycle components,” says Dr. Thomas.

“Unlike standard cell cycle analysis methods, such as flow cytometry and immunofluorescence, these sensors allow non-destructive real time analysis of the cell cycle in live cells. They have the potential to span the whole drug discovery process from target identification through screening to preclinical imaging.”

Dynamic cell cycle sensors provide a powerful tool across a broad spectrum of applications to monitor drug activities or side-effects against the cell cycle, and to measure the dependence of biological processes on the cell cycle, notes Dr. Thomas.

“siRNAs or other forms of RNA interference may have significant potential in disease therapy, both directly as alternatives to classical inhibitory drugs, particularly where problems of target specificity or resistance occur, and perhaps in the longer term, indirectly via RNAi-mediated stem cell differentiation for therapeutic implants,” says Dr. Thomas.

Previous articleClinical Data at ASH Boon to Wall St. Analysis
Next articleNIH Roadmap Home Page