In 2007, the company switched its focus to the analysis of next-generation sequencing data. The CLC Genomics Workbench, released in 2008, analyzes data from second-generation HTS instruments. Whereas first-generation sequencing machines typically generate 0.1 megabases of data per run, second-generation instruments can spew out up to 40,000 megabases in a single run, explains Knudsen. In fact, the amount of genomic sequencing data increases 10-fold every 18 months. “So there’s a critical demand for solutions that are really adept at handling and analyzing these huge amounts of data,” he says.
The CLC Genomics Workbench is a comprehensive package that analyzes and visualizes data from all major next-generation HTS platforms, such as SOLiD by Applied Biosystems, 454 GSflx by Roche, and Solexa by Illumina. When first launched, “no other companies were doing this,” says Knudsen. “We had a head start in the market, making us a premiere solution provider.”
Users of CLC Bio’s software are not locked into a single platform, but can use any or all HTS machines. Because different sequencing instruments offer different advantages, it makes sense to mix datasets into hybrid assemblies, Knudsen notes. This overlying strategy extends to software in development to handle the hundreds of thousands of reads generated by upcoming third-generation sequencers. “A key to our success is that our customers can mix data, and that will continue with new platforms,” he says.
Early in 2010, the company released version 2.0 of the CLC Genomics Server, an enterprise platform for next-generation sequencing data analysis. CLC Bio describes the Genomics Server a bioinformatics solution built on a three-tier server architecture. The company says that the server provides flexible options for executing centralized services, easy integration with other applications and services, powerful database communication and data integration, and a secure access control framework and central-action logging.
Version 2.0 of the CLC Genomics Server includes a wider range of features for handling HTS data, says Knudsen. He notes some key improvements to this version, including capabilities for parallel job executions on multiple computers through multiple job nodes, integration of third-party command-line tools and algorithms, support for file sharing and data management, and additional HTS analyses such as digital gene expression for RNA sequencing, SNP and DIP detection, and ChIP-seq analysis.