February 1, 2008 (Vol. 28, No. 3)
Helicos’ Genetic Analysis System Allows Acquisition of Billions of DNA Sequences in Parallel
Life scientists are striving to use the information emerging or obtained from the Human Genome Project to gain insight into basic biological processes and to improve human health and disease treatment. While results from a few whole human genome sequences have allowed a comprehensive view of the genomic landscape, it has only scratched the surface of understanding genome variation and its association to human physiology and disease.
The key to gaining this insight lies in the ability to analyze thousands of genomes with a high degree of resolution and from multiple dimensions—historically a prohibitively costly and time-consuming proposition. Helicos BioSciences (www.helicosbio.com) has developed the Helicos™ Genetic Analysis System, which is comprised of HeliScope™ Single Molecule Sequencer, the HeliScope Analysis Engine, and the HeliScope Sample Loader.
The system leverages True Single Molecule Sequencing (tSMS™) to provide researchers the ability to acquire genomic data from billions of individual molecules of DNA or RNA in parallel. Also, the tSMS process does not require amplification, whether by traditional PCR or any other method, thereby avoiding the introduction of errors and biases inherent in these techniques.
The tSMS process is a sequencing-by-synthesis approach. By using a polymerase and fluorescently labeled nucleotide triphosphates, complementary DNA strands are simultaneously synthesized for billions of templates inside a flow cell. After every fluorescent base addition, laser excitation and imaging are carried out to record the incorporation events on every template at the single-molecule level (Figure 1).
To make the sequencing of single molecules possible, Helicos’ scientists overcame numerous technical challenges. First off, it was necessary to ensure that light emitted from a single fluorophore on a nucleotide was detectable, which requires a high signal intensity that does not dissipate over time and low background fluorescence upon laser illumination.
Another challenge was that the modified nucleotides had to serve as good substrates for the polymerase enzyme; the efficiency and accuracy of proper base-pairing and phosphodiester bond formation could not be compromised by the use of fluorescently labeled nucleotides.
Finally, the surfaces that serve as solid substrates for the synthesis reactions, and as the background for the imaging, had to have low affinity for fluorescently labeled nucleotides, so as not to produce spurious signals or accumulate a high-fluorescence background.
To enable sequencing-by-synthesis, Helicos combined a polymerase with fluorescent nucleotides capable of rapid incorporation kinetics. In roughly six seconds the incorporation reaction reaches completion. Nevertheless, imaging labeled nucleotides at the single molecule level required altering their fluorescence behavior.
The company developed an imaging solution capable of enhancing the emission intensity of these fluorophores, which increases the time before photobleaching and limits intermittent emission, or blinking. In Figure 2A, the signal intensity for single fluorescently labeled G analogs is shown in the presence or absence of the imaging solution. In the absence of solution, only 21 single molecules were detected with a weighted average intensity of ~7.5 (arbitrary units). In the presence of solution, 341 single molecules were detected, with a weighted average intensity of ~23.5 (arbitrary units). This enhancement of fluorescence intensity allows the detection of nearly all the single molecules present in the system.
At the single-molecule level, the destruction and/or blinking of a fluorophore upon high-intensity light exposure results in an absent or inconsistent signal from that molecule, which would experimentally be interpreted as a false negative. The imaging solution makes it possible to expose fluorophores to high-intensity laser light for longer periods of time, which results in a sustained fluorescence signal.
Figure 2B, a photobleaching time course experiment, plots the number of single molecules that can be detected over an 80 second exposure time, both in the presence and absence of imaging solution. As shown in the graph, in the absence of solution, over 90% of the single molecules disappear after only 10 seconds of exposure. On the other hand, in the presence of the imaging solution, 95% of the single molecules are detectable after 80 seconds of exposure.
The results from Figure 2 demonstrate the utility of these fluorescently labeled nucleotides for single-molecule imaging and ultimately for use in single-molecule sequencing.
The HeliScope Single Molecule Sequencer can detect the presence of nonspecific, single-molecule adsorption to the surface. When sequencing single-molecule DNA templates, nucleotides that are incorporated into complementary DNA strands might be confounded with nucleotides that adsorb to the surface within a radius of the template equal to the diffraction limit of light.
Since this type of event could result in an insertion error in the sequence, surfaces and surface rinsing conditions were developed that were refractory to nonspecific adsorption and thereby capable of virtually eliminating seemingly spurious incorporation events.
Figure 3 shows nonspecific single nucleotides in a 1,000 µm2 area plotted over the course of multiple cycles of nucleotide incubation. As can be seen from the graph, the average number of nonspecifically adsorbed nucleotides does not vary significantly over the course of 70 incubations for all four nucleotides, with average values of 18.8, 14.4, 23.0, and 14.6 molecules per 1,000 µm2 for A, C, G, and T, respectively.
To put this number into context, if we assume an average template density of 1,000 templates/1,000 µm2, the chances of confusing a nonspecifically adsorbed nucleotide for an incorporated nucleotide are minimal.
The System at Work
The HeliScope Single Molecule Sequencer consists of a number of subsystems that enable the precise control of chemistry, imaging, and data processing required for tSMS. The system includes: a sensitive optical system composed of an optical train and a cooled CCD camera, an automated fluidics system for reagent mixing and delivery, total internal reflection illumination technology designed to achieve uniform illumination with minimal background, robotics for controlling auto-focus and rapid stepwise imaging of the surface, and computing that automates fluorescent object finding and registration.
To demonstrate the process, a series of images representing a tSMS experiment are shown for a subsection of a single position (out of thousands) on the flow cell surface (Figure 4). This data highlights the ability to implement single molecule imaging for sequencing individual strands of DNA in parallel.
The ability to sequence billions of single molecules in parallel has inherent advantages over amplification-based technologies and enables a large number of applications heretofore beyond reach.
Analyzing DNA at the single-molecule level allows researchers to query nucleic acids directly obtained from biological samples, thereby avoiding the introduction of PCR error and bias in the data. The avoidance of PCR bias becomes more critical in the case of quantitative applications such as digital gene expression or DNA copy number assessment.
Single-molecule sequencing is also ideally suited to the analysis of FFPE tissues or degraded specimens that produce degraded nucleic acids.
By simplifying the sample preparation process and by allowing high-density analysis, single-molecule methods minimize the cost and maximize the throughput of genomics research. This enables the complete processing of thousands of samples, essential for the acquisition of statistically significant results.
As a universal genomic analysis tool, the HeliScope Single Molecule Sequencer is designed to leverage the power of tSMS to eventually enable whole human genome resequencing for $1,000 within a few days and to enable genome research in the form of digital gene expression profiling, whole transcriptome resequencing, methylation profiling, candidate gene resequencing, and more. Making such multidimensional studies technically and economically feasible will be synergistic in helping to understand disease etiology and discover new treatments.