In the rush toward high-throughput screening, high-content data mining was pushed aside. Now, it’s racing to catch up, as researchers in disciplines throughout the world find meaningful ways to use data that’s already at their fingertips. What they’re realizing is that data mining for one or two parameters is no longer sufficient. Researchers may have teased out 20, 50, or more than 100 parameters for each compound in a screen but need to narrow their focus to the few, perhaps five, that are most relevant.
A number of companies as well as university researchers are developing high-content data-mining applications to resolve that challenge. In the process, many are going the extra mile to ensure that mining can be performed easily and accurately by scientists in their labs without the need to hand off the project to the bioinformatics department. Consequently, results are faster.
In any project, the challenge is to determine the right readout parameters to better evaluate the data. “That’s where the field is investing much of its energies,” notes Daniela Gabriel, Ph.D., associate director, center for proteomic chemistry, lead finding platform at Novartis Institutes for BioMedical Research (www.nibr.novartis.com). There’s still significant variation among data-mining tools, and “that influences the outcomes significantly.”
Dr. Gabriel found that the software normally used by her lab couldn’t evaluate very dense wells. That realization launched a project that ultimately compared four software analysis modules from different vendors to determine the best method for analyzing neurite outgrowth to determine neurotoxicity of small molecules. “We had nearly the whole well pretty much covered with primary neuronal cells,” she says. Of the four applications tested, choosing optimum algorithms for each, one was superior in analyzing those images.
Beyond evaluating the dense wells, the data mining tools had to deal with 20 to 50 parameters for analysis. “Nuclear area, nuclear intensity, neurite length, neurite area, luminosity, and many other features that may not be directly linked were part of the analysis, which provided the opportunity to make correlations that otherwise may not have been possible,” Dr. Gabriel explains. From those parameters, “we reduce it to about five useful features.”
In analyzing the high-content data-mining software, Dr. Gabriel’s main concerns were to ensure that the bitmap covered and detected all the cells. “Other evaluation parameters were ease-of-use of the software application and the speed of the algorithms in order to apply the application in secondary assays in drug discovery projects,” she says.
Combining Image Acquisition and Analysis with Data Mining
Multiparametric cellular data-mining is being addressed by Molecular Devices (MDC; www.moldev.com) through its Acuity Xpress® platform. It works with other MDC components including the MDCStore™ Database. This allows integration of image acquisition and image analysis data as well as the mining of high-content screening data. It provides results in days versus weeks traditionally required for comparable analyses, according to Pierre Turpin, Ph.D., product manager, cellular imaging-analysis software.
One benefit is that scientists get information quickly without needing to wait for the bioinformatics group to become involved, notes Dr. Turpin. Usually, he explains, people rely upon a mix of third party applications and tools that typically weren’t designed for high-content analysis. This approach thus constrains the number of parameters that can be dealt with at a given time. By combining image acquisition, analysis, organization, and data mining into one package, “we provide not just the data but the right data,” Dr. Turpin states.
Because cell-by-cell multiparametric results are linked automatically with the original image, it is easy for researchers to drill down to that image and validate the results, he emphasizes. Given the original image, Dr. Turpin notes that you must ask whether the analysis makes sense. “You may be looking at the wrong compound or an artifact unless you can look at the original image,” he says. “Or you could miss a compound.”
AcuityXpress uses an open application programming interface to simplify the process of reading and writing to the database and to facilitate exporting and importing data. “The database is fully scalable and can be installed on a PC or on a server for wider access.”
Project Specific Technology and Development
There are nearly as many applications for high-content analysis as there are projects. MAIA Scientific (www.maia-scientific.com) is developing what it calls an intuitive data-mining application for use with its high-content fluorescence and bright field imaging high-throughput screening.
Researchers at the National Changhua University of Education in Taiwan, in another example, are combining multiparametric data mining with case-based reasoning to develop a system to diagnose and develop a prognosis for chronic diseases.
Off-the-shelf solutions aren’t necessarily optimal or available for all disciplines. Consequently, some researchers are building their own. Pfizer Research Technology Center (www.pfizerrtc.com) is using high-content data mining to predict drug-induced hepatotoxicity. Scientist Arthur Smith, Ph.D., and colleagues developed a database of drugs that were marketed and safe and therapies that failed because of toxicity.
Then, using text mining, high-content biology, and primary cells, they developed a database of toxicological and pharmacokinetic content. Multivarient analysis was used to develop a decision-tree algorithm to identify toxic drugs. The result provides a highly accurate, early toxicological screen, according to Dr. Smith. Savings have been substantial enough for the program to be expanded to other areas.
Seth Harris, Ph.D., research scientist II at Roche (www.roche.com), is another case in point. He is developing a multistructure data-mining application for x-ray crystallography. Traditionally, he says, structural biology would provide one or two structures in an area. Now, it’s feasible to determine 100 or more structures of a target complexed with various small molecules.
His application is “somewhere between back of the envelope and preliminary implementation,” he reports. The focus right now is to understand what’s important in the structure. Currently, computational chemists and crystallographers get together and analyze the structure, identifying the properties that are important in a given development project. “I want the computer to further facilitate that.”
Dr. Harris’ intention is to push the conceptual framework from simple distance-based analysis so that it yields increasingly sophisticated metrics that can, for example, tabulate electrostatic metrics between the protein and the ligand. Because the significance of similar interactions varies according to the protein environment in which they occur, determining the most important parameters is difficult, he explains. Data like that “is hard to tabulate into numbers.”
The application is conceived as a guide for chemists engaged in drug design but it could also have merit as a data organizer. It is particularly advantageous for those who are new to a program or who work on multiple projects to help discover and track the most pertinent or novel structures.