Leading the Way in Life Science Technologies

GEN Exclusives

More »


More »
March 01, 2010 (Vol. 30, No. 5)

Quantitative HT Gene-Expression Analysis

Automating and Parallelizing the Process Can Help to Avoid an Analysis Bottleneck

  • Automation

    The prototype algorithm described above automates the processing of a single image. By taking advantage of the underlying programming language of Matlab, it is a simple matter to apply this algorithm in a batch process to analyze thousands of images automatically, or in a continuous process to analyze images as they are generated by a high-throughput process.

    Depending on particular IT infrastructure, generated image data might be accessed by MATLAB from files on disk, from a database, or even directly streamed in from the microscopes themselves. Processing and analysis results can be output to a report file, to a database, or perhaps to a web page.

  • Parallel Computing

    Click Image To Enlarge +
    Figure 2. The parfor (a parallel for-loop) keyword from Parallel Computing Toolbox enables simple conversion of a batch-processing application to run in parallel, taking advantage of a compute cluster.

    Depending on the nature of the particular steps taken in the image-processing algorithm, an individual image could take from a few seconds to a minute to be processed on a typical desktop computer. Processing the few thousand images in the FlyEx database might then take from half a day to a few days, a time frame that perhaps is acceptable in the context of a research program in developmental biology.

    In a pharmaceutical context, however, a typical high-throughput screening library  contains one or two million compounds that are tested by an HTS robot at a rate of up to 100,000 compounds per day. A time frame of several months to process these generated images would be an unacceptable delay.

    Parallel Computing Toolbox enables scientists to solve computationally intensive problems using MATLAB on multicore and multiprocessor computers, or scaled to a cluster, using Distributed Computing Server. By distributing the processing of high-throughput screening data across multiple computers, researchers can decrease analysis time by orders of magnitude.

    Simple parallel-programming constructs, such as parallel for-loops, allow scientists to convert algorithms to run in parallel with minimal code changes and at a high level without programming for specific hardware and network architectures (Figure 2).

    It is crucial that scientists are able to easily convert their algorithms to work in parallel while remaining at this high level, without needing to become experts in the traditionally complex techniques of programming for high-performance computing.

  • Future Trends and Requirements

    It is already a clichéd concept that the new technologies currently expanding the boundaries of pharmaceutical and biomedical research are generating ever-increasing amounts of data, and that automated analysis algorithms and efficient use of parallelization using computer clusters are vital if we want to avoid an analysis bottleneck and allow science to proceed at its own rapid pace.

    Tools are, therefore, needed to enable the prototyping, automation, and easy parallelization of these algorithms. The use of technical computing environments such as those described in this article allow scientists to move rapidly from interactively visualizing data, to prototyping a processing algorithm, to automation, to a high-throughput solution in a single environment.

    Adoption of these techniques will allow science to avoid analysis bottlenecks and to continue to proceed at its own pace, even in a high-throughput context. 

Related content