The bioinformatics tools offered by Denmark’s CLC Bio are like “a Swiss Army knife for genomic data analysis,” says Thomas Knudsen, CEO. The company develops and markets software for the analysis of high-throughput sequencing (HTS) data and says its bioinformatics algorithms are wrapped in a user-friendly graphical interface.
Brothers Bjarne Knudsen, Ph.D., and Thomas Knudsen started CLC Bio in 2005 by offering a free software program known as CLC Sequence Viewer. Dr. Knudsen, a bioinformatics expert, created the technical aspects of the software, while Thomas Knudsen serves as CEO.
The Sequence Viewer software remains free and can be downloaded from the company’s website. In the first year, 100,000 downloads occurred, and by 2008, the number of downloads passed one million. When first launched, the software was “a powerful and intuitive way to show people how to do bioinformatics,” says Thomas Knudsen. Although designed for first-generation sequence data, he believes that the software makes a good teaching tool for students in molecular biology.
In 2007, the company switched its focus to the analysis of next-generation sequencing data. The CLC Genomics Workbench, released in 2008, analyzes data from second-generation HTS instruments. Whereas first-generation sequencing machines typically generate 0.1 megabases of data per run, second-generation instruments can spew out up to 40,000 megabases in a single run, explains Knudsen. In fact, the amount of genomic sequencing data increases 10-fold every 18 months. “So there’s a critical demand for solutions that are really adept at handling and analyzing these huge amounts of data,” he says.
The CLC Genomics Workbench is a comprehensive package that analyzes and visualizes data from all major next-generation HTS platforms, such as SOLiD by Applied Biosystems, 454 GSflx by Roche, and Solexa by Illumina. When first launched, “no other companies were doing this,” says Knudsen. “We had a head start in the market, making us a premiere solution provider.”
Users of CLC Bio’s software are not locked into a single platform, but can use any or all HTS machines. Because different sequencing instruments offer different advantages, it makes sense to mix datasets into hybrid assemblies, Knudsen notes. This overlying strategy extends to software in development to handle the hundreds of thousands of reads generated by upcoming third-generation sequencers. “A key to our success is that our customers can mix data, and that will continue with new platforms,” he says.
Early in 2010, the company released version 2.0 of the CLC Genomics Server, an enterprise platform for next-generation sequencing data analysis. CLC Bio describes the Genomics Server a bioinformatics solution built on a three-tier server architecture. The company says that the server provides flexible options for executing centralized services, easy integration with other applications and services, powerful database communication and data integration, and a secure access control framework and central-action logging.
Version 2.0 of the CLC Genomics Server includes a wider range of features for handling HTS data, says Knudsen. He notes some key improvements to this version, including capabilities for parallel job executions on multiple computers through multiple job nodes, integration of third-party command-line tools and algorithms, support for file sharing and data management, and additional HTS analyses such as digital gene expression for RNA sequencing, SNP and DIP detection, and ChIP-seq analysis.