Rapid advances in instrumentation and robotics have made high-content screening (HCS) faster than ever. Quantities of data have increased beyond all past superlatives to the point that it can best be described as ridiculous. Historically, vendors have been behind the curve, offering relatively underpowered systems with inflexible data-analysis packages. As computational technology catches up with assay technology, scientists are embracing the open-source movement and polishing up their programming skills to supercharge their already fast screens.
At CHI’s “High Content Analysis” conference to be held later this month in San Francisco, leaders in the field will gather to share their data-finessing successes. One of the most intense areas of interest in this field is image analysis. When a thousand images may be taken of a single plate, and plates are processed in batches of hundreds, how to handle the data is a nontrivial problem.
Simply moving that volume of graphics files from one place to another can be a chore, never mind analyzing them. But this is exactly what John McLaughlin, Ph.D., a research fellow at Rigel Pharmaceuticals, does on a weekly basis.
In its pursuit of aurora kinase inhibitors, Rigel has developed a phenotypic screen using pattern recognition. This is the same technology being developed by law- enforcement agencies for screening video images for criminal suspects.
Pattern-recognition identifies features in test images and then uses a classification system to train classifiers, which can then be used to mine large image datasets for patterns of interest. According to Dr. McLaughlin, “it’s a highly dimensional kind of data. In this case, there are 140 measurements for every cell. Not only does this technology help us quantify huge datasets more efficiently it can also suggest potential mechanisms of action of our compounds.” Pattern recognition also has utility in deterring the mode of action of a drug without spending a great deal of time and resources on secondary screens.
The assay looks for proliferation of cells after treatment with a small molecule inhibitor compound. In order to analyze the data, Dr. McLaughlin uses a cluster built from 20 or 30 PCs. “Nowadays you can buy quad cores or dual-quad cores for not really all that much. If you’ve got a couple of those then you’ve got a small cluster.”
A typical screen will process for four days on this cluster—a testament to the mind-boggling size of the dataset. “A year ago, we had a backlog of things we needed to do, so I pulled in a number of computers from other groups at our company and used them at night when people went home.”
One challenge faced by Dr. McLaughlin and other scientists working with large sets of imaging data is the closed nature of software packages. “Most vendors have some level of customization built in, but I’ve found many times with industry-standard systems that they don’t provide nearly enough flexibility,” Dr. McLaughlin explains.
“I understand the reasoning for keeping code proprietary, but before I commit to an analysis system I often don’t know what my requirements are, I just know it’s inevitable that something they don’t allow for will arise later. I wouldn’t write my own code if I didn’t have to, so if they would open their code up fully or partially like Matlab, then I could spend my time doing science instead of building tools.”
Dr. McLaughlin’s lab has adopted an open-source program called CellProfiler, developed out of the Broad Institute, which is built in the Matlab programming language linked to a MySQL database platform.
For scientists who are not programming experts, which is most of them, tools and applications are becoming available that bridge the gap for nonpower users. The Open Microscopy Environment (OME), an international open-source consortium, has developed OME Remote Objects (OMERO), an open-source Java-based software suite that can import and analyze the output from just about any type of microscope or high-content analysis (HCA) image data.
Unlike many other graphics export/import actions, OMERO captures not just the pixels in the image file, but the metadata associated with them. OMERO also interfaces with many popular analysis programs like Matlab and CellProfiler.
Jason Swedlow, Ph.D., professor at the University of Dundee and president of Glencoe Software, is one of the founders of OME and leads the team that developed OMERO. “We wanted to develop tools that would provide infrastructure for data management and enable interoperability between different types of image data and analysis tools. Moreover, our whole approach and philosophy is open—we are passionate about developing a community.”
OME’s viewpoint is that most labs are enterprise-data producers, and that the scale of data is comparable to that managed by banks and hedge funds. Therefore, biologists should have the same powerful tools that these industries use to manage their data. The concept seems to be catching on. Dr. Swedlow estimates that OMERO has been installed on roughly 1,100 servers around the world. “We have tens of terabytes of data under management on our own servers in our lab and know of many larger installations.”