How do you deal with data overload? How important is informatics technology to your workflow?
Dr. Gao: We are lucky to have great support from our institution. We have more than 600 terabytes of online storage and a tape-based data backup system in place. We also have a good bioinformatic core with several commercial and open-source software systems. But data analysis is still very challenging.
Our group works with the university’s bioinformatics core on data analysis, but the customer decides how the data will be analyzed. Software can do quite a bit, like aligning the data to references, mutation/SNP calling, deletions/insertions calling, gene expression, DNA binding site peak-finding, and others. We also rely on our own scripts for special applications. But no one software package is capable of doing everything for next-generation sequencing.
Dr. Green: This is a definitely a major concern for us. Although the Ion Torrent files are small relative to those generated by Illumina and Roche 454, the data output is still impressively large. We are currently seeking solutions with our developing bioinformatics department.
Dr. Jafari: In the beginning we had many issues with the amount of data we generated. Currently, we are not retaining a large amount of those data and images. At the same time, we now have access to the Northwestern computing and storage capabilities, which has made our data management much easier. Our new data-retention policy will further help our data-management and will prevent data overload.
Kingham: Data overload need not necessarily be a problem, but it must be handled properly or bad things will happen. As a shared resource facility we deal with this by implementing a data retention policy and seeing that investigators understand what this means for their data. Informatics technology has never been more important. Even with data overload, it is our responsibility to see that all the data has integrity, and is properly backed up.
Dr. Kiss: Illumina’s BaseSpace solution is very attractive to us and may be the tipping point in helping us decide whether to purchase an Ion Torrent or a MiSeq instrument. We are required by NIH and NSF to store all data for five years after the project is completed. We currently use CD backup, server backup, and external HDD backup.
Dr. Lyons: Information overload is indeed a huge, huge problem. We are constantly struggling to manage, store, and deliver the data. We dedicate a significant amount of effort toward developing sample-tracking and project-tracking software specific to our unique needs. We dedicate a significant amount of effort toward increasing our basic storage infrastructure. We don’t even try to do the bioinformatics; that is the province of a separate core, but they are badly overloaded and it’s going to get worse!
Dr. Thomas: We deal with data poorly and ad hoc, but we’re working to provide centralized servers and storage. Informatics technology—both storage and software—are critical for our users who are not computer scientists, and who may be conducting their first-ever DNA sequencing experiment.
How can instrument makers improve instruments and streamline workflows?
Dr. Gao: We need faster turnaround. Eight or eleven days is too long for a sequencing run. Higher throughput and lower instrument costs are also important.
Dr. Green: The length of the workflow, particularly for the Ion Torrent, is a significant concern, and we have been looking for measures to address this. The time itself is not so significant, but that much of it is hands-on laboratory time, and some aspects of chip loading are highly sensitive to the experience of the user.
Dr. Jafari: Easier workflow and better on-instrument data-analysis software would be instrumental in streamlining NGS projects. By “on-instrument” I mean having a separate computer, not part of the instrument, holding the software. I think this should conduct basic sequencing analysis, just like they have for microarray analysis. Affymetrix and Illumina have basic tools that, if the user provides some basic information, will spit out some-fold change, P-values, and basic stats. I think this can now be done, especially for RNA-seq and ChiP-seq.
Kingham: At the rate this field is advancing, this is a difficult question to answer. Maybe some of these instruments or workflows should be improved before they are commercialized. The level of variability seen on many next-gen platforms needs improvement.
Dr. Kiss: Making the bioinformatics pipeline fully automated and consistently improving this aspect of the post-run analysis would be a big improvement. The CLC Genomics Workbench software package appears to perform most of the automation a facility like ours is looking for, as well as satisfying much of our user base.
Dr. Lyons: I can’t really comment much on this topic. Manufacturers are stressed just as we are, trying to keep on top of a dramatically evolving field. While there are many things they could do to help us (improved software, flexibility of applications, better bioinformatics support), in reality, the manufacturers probably have their hands full just keeping their instruments competitive in terms of the most basic productivity measures.
Dr. Thomas: Instruments could be made cheaper and faster. One of my major issues is the lack of clarity with companies like 454 and Illumina with regard to their products’ details. Developing new protocols is almost impossible as it is extremely difficult to obtain detailed information for key things like linker sequences. This kills the distributed application development process.
Is your facility considering upgrading hardware, software, or some other aspect of your sequencing activities?
Dr. Gao: We are considering upgrading to HiSeq 2500 to have a faster turnaround option. Oxford Nanopore technology is very promising for lowering costs, faster turnaround, and longer sequence reads.
Dr. Green: We are investigating automated robotic library preparation to allow us to focus more on quality control of library preparation and increase sample throughput. In addition, we are planning on purchasing the next-generation Ion Torrent instrument—the Proton. This will eventually allow for whole human genome sequencing on a single chip.
Kingham: We are always considering upgrading. It’s important to stay on top of where the technology is going. Institutional investigators are really what drives the acquisition of new technology, so it’s important to effectively communicate what the future genomics landscape will look like.
Dr. Kiss: Yes, we are either going to purchase an Ion Torrent or a MiSeq instrument. Full genome sequencing is available to Miami University principal investigators via the Ohio State University 454 Roche sequencers. We cannot afford to duplicate this, and there is no reason to. But, we could definitely afford $80,000–$120,000 for the benchtop sequencer with low acquisition and operating costs. We are also considering buying CLC Bio’s Genomic Workbench as site-licensed software for next-generation sequencing analysis. What is most attractive to us about this software is its cross-platform nature.
Dr. Jafari: We are planning to upgrade our 5500xl to 5500W, which is supposed to eliminate the use of ePCR and double output. These upgrades will lower our prices significantly. We are looking forward to getting the more affordable instruments that can handle whole-genome sequencing faster and at much lower cost. It is best to avoid having a few large centers running all the whole-genome sequencing projects.
Dr. Lyons: We are almost always planning expansions. Two more HiSeqs should arrive in the next couple of months. We may acquire an eighth HiSeq soon thereafter. Our newest sequencer, the PGM, needs to be provided with IT support and a technical team. Other instruments recently added include a Qiagen PyroMark and another Sanger sequencer.
A MiSeq is almost certainly in our future. The FLX should soon get the Plus upgrade. At least some of the HiSeqs will get upgraded to the 2500 model. Our LIMS system is undergoing vast upgrade to accommodate recent changes. This field is truly in a constant state of flux.
The reason we upgrade is simple. Newer instruments almost always provide significantly improved cost efficiency, improved production, and better data. Our clients benefit by staying in the forefront of their research field.
Dr. Thomas: Yes, due to overwhelming need we are now in the process of purchasing software for university-wide support. We’re looking for software that is user friendly, but the problem with such software is that it is not readily compatible with normal laboratory computers.