At the Image Informatics and Computational Biology Unit in the Laboratory of Genetics at the NIH, Ilya Goldberg, Ph.D., head of the unit, and colleagues are developing computational strategies based on pattern recognition to derive and interpret quantitative measurements from morphological assays.
By training computers to recognize patterns in biological image data acquired via automated, high-throughput microscopy, they are, in essence, teaching computers to think like humans and use intuitive thought processes to cull through and analyze the massive amounts of data generated by image-based high-content screens.
Once the computers know what to look for in the images, they can then apply image classifiers developed by Dr. Goldberg’s team that allow the computers to extract the desired information and transform it into quantitative data for analysis. The ability to extract quantitative data from high-content screens will allow researchers to perform a broader scope of assays including, for example, dose-response assays designed to generate standard curves.
Dr. Goldberg will be among the presenters at CHI’s upcoming “High Content Analysis” meeting in San Francisco. He describes a key challenge in developing pattern-recognition models for biological image analysis—a technology initially developed for remote-sensing applications: telling the software how to identify the objects you want to measure, which in the case of cell-based assays, are subcellular organelles that have been stained with fluorescently labeled antibodies.
Using positive and negative control sets, the computer is trained to identify differences in morphological characteristics of interest, such as shape or intensity, for example, without first having to identify cellular or subcellular structures in the image. This “pure pattern-recognition” strategy offers advantages over more conventional algorithm-based image processing that relies on locating individual cells or specific organelles, according to Dr. Goldberg. Teaching a computer to differentiate objects and recognize patterns in images, rather than giving it algorithms designed to mimic human thought processes, allows for the development of a more generic, multi-purpose analytical approach.
Algorithms and parameter-based methods are designed for specific image-processing tasks. Pattern recognition can be applied more broadly to a variety of image-based quantitative assays.
Compared to humans, computers can distinguish much more subtle differences in cell and organelle morphology using the data obtained from assays performed on automated microscopy systems. With adequate training, computers are better able to filter out the characteristics and specific changes of interest in a cell from the sea of information produced that includes anecdotal evidence and meta-data about how the image was acquired.
Goldberg’s group developed an image-classification system that incorporates more than 2,000 numerical image descriptors. The computer is taught how to apply the image classifier to translate the qualitative information obtained during image analysis to quantitative data. By comparing a variety of control datasets, the computer determines how to distinguish natural experimental variation in an image (and filter it out of the analysis) from aspects of the image that help distinguish it from other images.
The computer is also taught techniques for filtering through the descriptors and ranking them according to which are most effective in helping it distinguish between the different control sets. The computer then uses this image classifier to analyze new datasets.
One downside of this pattern-recognition strategy is the difficulty researchers often have interpreting what the computer is “seeing” and basing its results on, says Dr. Goldberg.
One of the ways the NIH group has tested and applied pattern recognition in high-content screening studies is to characterize the morphological transitions that cells undergo as an organism ages. In a screen designed to study muscle degeneration in worms as a determinant of physiological age, the group trained the computer on a classifier derived from images of the neuromuscular cells that comprise the pharynx collected from age-grouped Caenorhabditis elegans.
The computer learned how to determine physiological age based on quantitative analysis of the tissue architecture and pharynx function. The results led Dr. Goldberg’s group to conclude that rather than being a gradual process, aging in worms occurs in three distinct stages. The group is expanding on this experimental approach to explore how it can apply pattern recognition and quantitative image analysis to identify other types of nonlinear transitions that are governed by biological processes.