|SEND TO PRINTER|
Feature Articles : Apr 15, 2008 ( )
Managing Data from Next-Gen Sequencing
Tremendous Volume Can Be Handled by Tools Available Now; Complexity Is Another Matter
Using novel sequencing chemistries, microfluidic systems, and reaction-detection methods, next-generation sequencing vendors offer 100- to 1,000-fold increased throughput and 100-to 1,000-fold decreased cost as compared to conventional Sanger DNA sequencing.
Where high-throughput sequencing was previously limited to top sequencing centers, these new instruments are bringing large-scale sequencing as a research tool into institutional core facilities, small research groups, and the labs of individual principal investigators.
This research tool is not limited to the de novo sequencing of whole genomes. Rather, the nature of next-generation sequencing data, with many more and generally shorter reads at lower cost, make it applicable to many forms of resequencing experiments (e.g., genotyping, comparative genomics, and phylogenetic studies).
As more and more groups perform next-generation sequencing operations, two data-management problems have been revealed—data volume and data complexity. The data-volume conundrum is new to the recent converts, issues with data complexity are new to all.
Each next-generation sequencer is unique in terms of the volume and nature of the data it generates over time and is greatly affected by how the instrument is used. Generally, purchasers of $0.5–2 million instruments intend to operate it at near full capacity, generating anywhere from 600 GB (gigabytes) to 6 TB (terabytes) of data per run over a period of one to three days per run.
In the old-days of conventional Sanger DNA sequencing, the data from many instrument runs generally contributed to a single common experiment, where dozens of resulting files were uniquely named with a user-defined convention that identified the experiment that the run belonged to.
© 2012 Genetic Engineering & Biotechnology News, All Rights Reserved