As GEN reporter Greg Crowther, Ph.D., pointed out in his article on next-generation sequencing in our March 1 issue, the NGS field is rapidly expanding. In addition to keeping track of ongoing advances in instrumentation, scientists are increasingly interested in NGS practices related to sample prep and analysis. In other words, NGS is becoming a critical component of many investigators’ armamentarium of research tools.
Based on the recognition of the growing importance of NGS, we created this special “Tech Tips” section on the topic for this issue of GEN. We interviewed leading NGS scientists and practitioners to gain their insights on how they best utilize this technique and how they get the most bang for their buck for a wide range of applications. The people interviewed for this special feature included Harry Gao, M.D., Ph.D., director, DNA Sequencing Laboratory, City of Hope Medical Center; Stefan J. Green, Ph.D., director, DNA Services Facility, University of Illinois at Chicago; Nadereh Jafari, Ph.D., research associate professor director, Genomics Core Facility, Northwestern University; Brewster Kingham, director, DNA Sequencing & Genotyping Center, Delaware Biotechnology Institute, University of Delaware; Andor Kiss, Ph.D., supervisor, Center for Bioinformatics and Functional Genomics, Miami University of Ohio; Robert Lyons, Ph.D., director, DNA Sequencing Core, University of Michigan; and W. Kelley Thomas, Ph.D., director, Hubbard Center for Genome Studies, and professor, department of Molecular, Cellular and Biomedical Science, University of New Hampshire.
We specifically asked our interviewees to describe the types of instruments they currently use and the kinds of experiments they are carrying out. We also queried them on probably the greatest roadblock in biotech research today: data overload. How do they deal with it and, more importantly, what are some solutions for overcoming this problem?
These and other questions were designed to elicit responses from our interviewees with you, our readers, in mind. We believe that by tapping into the expertise of top NGS scientists and discovering their approaches to exploiting this sophisticated methodology, we can help shed light on NGS issues that you might be trying to work your way through while simultaneously providing you with some advice on ways to conduct your NGS experiments to garner better results.
What types of next-generation sequencing does your facility use, and for what types of experiments or samples?
Dr. Gao: We use the Illumina HiSeq 2000, GAIIx, and Roche 454 FLX. We sequence more than 90% of our samples on the HiSeq 2000. More than 50% of the samples we receive are for smRNA sequencing, 10% for ChiP-seq, 20% for target, exome, or whole-genome sequencing, and 15% for RNA-seq. The remaining 5% are for methylation and other projects.
Dr. Green: The DNA services facility at the University of Illinois, Chicago houses an Ion Torrent Personal Genome Machine, and—in collaboration with our sister facility at the University of Illinois Urbana-Champaign (UIUC)—has access to Roche 454 Titanium and Illumina HiSeq 2000 platforms. Our facility has focused on instrumentation for library preparation, including a Covaris S2 acoustic shearing device for DNA shearing, a PippinPrep automated size selection device, and an Ion Torrent OneTouch instrument for automated emulsion PCR. In addition, we have a Sequenom MassArray 4 for moderate, multiplexed genotyping (particularly single nucleotide polymorphism analysis).
Our facility accepts a wide variety of sequencing projects, from genome sequencing and resequencing (viral, bacterial, small eukaryote), metagenome sequencing (natural and contaminated soil and aquatic samples), genotyping and targeted re-sequencing, microbial ribosomal RNA gene sequencing, cancer panel amplicon sequencing, and RNA-seq. We perform library preparations directly from nucleic acids provided by customers, or from nucleic acids extracted in our facility.
Dr. Jafari: Our core facility has two Applied Biosystems SOLiD 5500xl instruments, and will soon install an Ion Torrent system. We provide Chip-seq, RNA-seq, and targeted resequencing including exome-seq.
Kingham: My facility provides Illumina SBS sequencing on the HiSeq 2000 platform, as well as single-molecule SMRT sequencing on the Pacific Biosciences RS platform. Many of the Illumina experiments have been small RNA and transcriptome sequencing, with some genomic and targeted sequencing experiments. We are still in the technology-assessment phase with our PacBio RS, so we have not officially launched this as a service. However, we are starting to see the capabilities of the RS influence on our Illumina queue by increasing the number of long read paired-end runs. Hybrid assemblies using Illumina data to “polish” the lower accuracy of PacBio data is becoming standard for those fortunate enough to have access to both platforms.
Dr. Kiss: We currently use a turnkey service at The Ohio State University via 454 Roche, as well as Illumina HiSeq 2000 at the University of Cincinnati. Our projects mostly involve cDNA (EST libraries) and RNA-seq.
Dr. Lyons: The University of Michigan DNA Sequencing Core has most of the major commercial sequencing platforms (five HiSeqs, one GA, two SOLiD 4s, a 454 FLX, a PacBio RS, and an Ion Torrent PGM). The majority of our next-gen activities involve human whole-genome, shallow-draft sequencing on the HiSeq. However, we try to meet all the needs of our very diverse clients, so just about any service procedure is offered, with varying levels of support (ChIP-seq, RNA-seq, metagenomics, microbiomics, exomes, targeted capture, nonhuman genomes, etc.).
Dr. Thomas: At New Hampshire we use the 454 and Illumina platforms to conduct whole-genome shotgun, RNA-seq, BSSeq, metagenomics, metatrascriptome, and a lot of RefSeq. We do many small projects, such as partial 454 plates or single Illumina lanes, per month.
Who are your customers? Which do they value more, cost or speed?
Dr. Gao: Most of our approximately 100 samples per week come from our internal university users. Fewer than 10% are from outside organizations. Cost, turnaround time, and data quality are all important for the researchers we serve.
Regarding speed, filling a flow cell for a particular run takes time. We can fill one flow cell easily for a short, single-read 40 bp run by combining miRNA-seq and ChiP-seq samples. It takes longer to obtain enough sample for a PE 2 x 100 bp run. It would be great to be able to run each lane independently, for example with SOLiD 5500xl.
The turnaround depends on the technology and application. For the Illumina PE 2 x 100 bp, turnaround time is normally 8 days, or 11 days when we’re running 2 flow cells simultaneously. The Oxford Nanopore technology will change the field completely with fast turnaround—about 15 minutes per human genome, and the cost is expected to be less than $1,000. This system is supposed to be available by the end of this year.
Dr. Green: Our customers are largely affiliated with the University of Illinois and include both faculty and research physicians. In addition, external customers typically include microbial ecologists looking for amplicon sequencing, genome, metagenome, and metatranscriptome sequencing. Typically, but not always, cost is more important for my customers than speed. This varies from project to project, and some projects are highly time sensitive (particularly those supporting grants).
Because of the diverse types of samples and projects, it is hard to provide an approximate throughput as this varies significantly. We are currently a rather small facility, processing, for example, only a few Ion Torrent samples per week.
Our facility provides a variety of services to address a range of scientific endeavors—from medical to environmental research. By maintaining an Ion Torrent, we have the ability to rapidly produce sequence data for small to medium projects, and collaborate with other sequencing facilities for the largest projects. Through this approach, we are able to match the appropriate sequencing platform to meet the cost and time constraints of our customers.
Dr. Jafari: We mostly provide services to our internal users at Northwestern University and its affiliates. Both speed and cost are important to our investigators, but current budget issues have put more strain on researchers from the cost perspective.
Kingham: Our customers are largely University of Delaware investigators, or investigators with ties to the university. The rapidly growing field of translational research has connected us with many clinical researchers employed by regional healthcare systems. Most investigators are looking for a balance between cost, amount of data, data quality, and turnaround time. This balance can fluctuate based on the project. For example, turnaround time is critical for obtaining preliminary data for an upcoming grant, while data accuracy is more important for the targeted sequencing of an oncogene. Our HiSeq 2000 runs at about 75–80% capacity; we are still validating the PacBio RS.
Dr. Kiss: Our customers are members of the Miami University community. Cost and speed are both factors for them, but I would say cost is more of a concern. We currently process very few samples as we do not currently have instrumentation on site. In addition to the two facilities mentioned earlier, for large genome sequencing projects we recommend the facilities at Ohio State University’s Plant-Microbe Genomics Facility. Despite its name, this facility conducts all types of sequencing.
Dr. Lyons: We have roughly 100 distinct next-gen client laboratories, virtually all of whom are at our own university. We are willing to accept projects from outside users, but are required to add a surcharge to their recharge rates. In practice, few outsiders opt to send samples for next-generation sequencing here—unlike our Sanger services.
It is impossible to express throughput in terms of either samples or projects, due to the extreme diversity of the projects we handle. For one project, we’ve done over 1,000 human genomes in the past year, with 3,000 more to be completed by mid-2012. Other clients occupy a single lane, yet require disproportionately higher effort on our part. Sample counts are misleading, too. Sometimes a single sample will occupy numerous lanes, while other times a single lane could have up to 96 samples in it.
In my opinion, clients are somewhat more concerned with cost right now than with speed. A close third consideration, though, is flexibility and availability of options. We try to accommodate clients with urgent needs or nonstandard protocols when we can.
Dr. Thomas: We organize sequencing primarily for research groups at the University of New Hampshire. These researchers are primarily environmental biologists and microbiologists, with a few biochemists; both speed and cost are issues for them. We have wait times in excess of six months and sometimes as long as one year. This is specific to the Illumina platform, which is the most popular and cost-effective. But every run is two weeks long. This is a major issue as there is not sufficient infrastructure in the U.S. to meet sequencing research needs. It is also a problem for us because we have to go outside for this service.