How do you deal with data overload? How important is informatics technology to your workflow?
Dr. Gao: We are lucky to have great support from our institution. We have more than 600 terabytes of online storage and a tape-based data backup system in place. We also have a good bioinformatic core with several commercial and open-source software systems. But data analysis is still very challenging.
Our group works with the university’s bioinformatics core on data analysis, but the customer decides how the data will be analyzed. Software can do quite a bit, like aligning the data to references, mutation/SNP calling, deletions/insertions calling, gene expression, DNA binding site peak-finding, and others. We also rely on our own scripts for special applications. But no one software package is capable of doing everything for next-generation sequencing.
Dr. Green: This is a definitely a major concern for us. Although the Ion Torrent files are small relative to those generated by Illumina and Roche 454, the data output is still impressively large. We are currently seeking solutions with our developing bioinformatics department.
Dr. Jafari: In the beginning we had many issues with the amount of data we generated. Currently, we are not retaining a large amount of those data and images. At the same time, we now have access to the Northwestern computing and storage capabilities, which has made our data management much easier. Our new data-retention policy will further help our data-management and will prevent data overload.
Kingham: Data overload need not necessarily be a problem, but it must be handled properly or bad things will happen. As a shared resource facility we deal with this by implementing a data retention policy and seeing that investigators understand what this means for their data. Informatics technology has never been more important. Even with data overload, it is our responsibility to see that all the data has integrity, and is properly backed up.
Dr. Kiss: Illumina’s BaseSpace solution is very attractive to us and may be the tipping point in helping us decide whether to purchase an Ion Torrent or a MiSeq instrument. We are required by NIH and NSF to store all data for five years after the project is completed. We currently use CD backup, server backup, and external HDD backup.
Dr. Lyons: Information overload is indeed a huge, huge problem. We are constantly struggling to manage, store, and deliver the data. We dedicate a significant amount of effort toward developing sample-tracking and project-tracking software specific to our unique needs. We dedicate a significant amount of effort toward increasing our basic storage infrastructure. We don’t even try to do the bioinformatics; that is the province of a separate core, but they are badly overloaded and it’s going to get worse!
Dr. Thomas: We deal with data poorly and ad hoc, but we’re working to provide centralized servers and storage. Informatics technology—both storage and software—are critical for our users who are not computer scientists, and who may be conducting their first-ever DNA sequencing experiment.