Reasoning Across Datasets
The BioAssay Ontology (BAO), developed by the Center for Computational Science at the University of Miami Miller School of Medicine, is designed to provide an ontology-driven, semantic description of data generated from high-throughput biological screens that can be mined in an integrated, inferential fashion.
Combined with software tools that allow users to browse, query, and explore diverse datasets, the BAO provides a standardized approach to facilitate data retrieval and analysis and allow for the integration of data from multiple high-content and/or high-throughput screens.
The ability to search large amounts of diverse data and establish relationships on a conceptual level will allow computers to reason, draw conclusions, and seek answers to biological questions. The BAO project also encompasses a data-curation component in which the results of many annotated assays drawn from PubChem are integrated with the ontology.
The University of Miami group has released a beta version of its BAO software, which will be more broadly available by early 2011. A main goal of the BAO project was to create a technology that could apply inference/reasoning across large datasets, according to Stephen Schürer, Ph.D., assistant professor at the university and a member of the BAO project team.
This would be analogous to typing even a simple question-based query into a computer search engine, explains Dr. Schürer. While search engines can cull through datasets looking for specific words or phrases, when presented with a question, they cannot readily mine a database for relevant information and process the information to develop a response.
In an ontology-based system, a concept is not only defined by a word or phrase, but rather in a form—and using a standardized vocabulary—that a computational system can understand. A particular word, for example, would not only have a specific meaning associated with it, but would also have relationships that link it to other words and concepts.
“The first version of our software cannot do this yet; it cannot make inferences of scientific relevance,” says Dr. Schürer. But the group is working toward a system that will ultimately be able to integrate the results of screening studies, gene-expression data, findings from knock-out experiments, and knowledge of biochemical pathways, cell-signaling networks, and other cell and systems biology information for example, and present it in a way that allows a computer to identify relationships and make inferences. A computer could then answer questions such as, “In which types of biologies/assays are these compounds active?”, and make determinations such as whether a “hit” on a screen is an artifact of a certain method of screening or whether the compound is active across multiple assays.
The challenge is to enable “reasoning across huge datasets,” says Dr. Schürer, describing this as an area of active research in computer science. One solution may involve cloud computing. “Because all the data is directly or indirectly related, inferences across large datasets can likely provide novel, meaningful insights.”
David Andrews, Ph.D., professor of biochemistry and biomedical sciences at McMaster University, will describe his group’s work using automated microscopy to explore cell physiology in a presentation sponsored by PerkinElmer.
Dr. Andrews identifies three main trends in high-content screening for drug discovery. First, there is demand for more realistic assay conditions using live cells, and increasingly a move toward the use of primary cell cultures. To achieve this, Dr. Andrews suggests that temperature control of the cell cultures during screening is essential, the presence of carbon dioxide is helpful but not required, and humidity level control is important but can be maintained simply by covering the plate with a lid.
Temperature control requires the use of an incubator, which can either be part of the imaging system as a built-in or modular component, or a stand-alone unit that is able to communicate with the imaging and robotics platforms for rapid and efficient transfer of plates to/from the incubator and viewing stage.
The second trend is the emergence of numerical image-analysis technology that is enabling increasingly quantitative output from imaging studies. High-content assays are moving beyond more descriptive types of results, such as the translocation of a molecule from the cytoplasm to the nucleus, and to more sophisticated imaging screens capable of generating intensity-based data such as nuclear/cytoplasmic area or nuclear density measurements, and of yielding standard deviations. Using numerical image analysis, “we can make more than 500 measurements per cell,” says Dr. Andrews.
Furthermore, computers are able to distinguish many more subtle changes than are visible to the human eye, he notes. These small differences can be quantified and can, for example, allow the software to differentiate a stress response from the early stages of cell death in response to the introduction of a cytotoxic agent, distinguishing necrosis from apoptosis. In this way, high-content screening is helping researchers uncover drug mechanisms, Dr. Andrews adds.
The third trend is the growing use of fluorescence lifetime in high-content screening, which yields chemical findings that underlie biological processes and can be used, for example, to measure protein-membrane interactions or pH changes. Dr. Andrews’ group is using a modified beta-version of PerkinElmer’s Opera™ confocal microplate imaging reader to explore novel applications of fluorescence lifetime assays.