Up-and-Coming Genomics Technologies Reveal Previously Unseen Structural Variants

For more than 20 years, Ricky Ramon endured multiple surgeries to remove mysterious benign tumors from various parts of his body, including inside his heart. Though his doctors suspected a genetic disease called Carney syndrome, genetic testing using standard short-read DNA sequencing failed to find any changes to the relevant genes. This outcome is all too common: Sequencing only identifies the cause of genetic disorders about 30 percent of the time.

When Ramon was 21, his doctors recommended a heart transplant, but his eligibility for the procedure depended on whether a new heart would remain tumor-free—a possibility they could not rule out. In a last-ditch effort to uncover the genetic cause of Ramon’s tumors (and determine his eligibility for a new heart), a research team led by Stanford University professor of medicine Euan Ashley, M.D., Ph.D., decided to give DNA sequencing another chance. This time they turned to long-read sequencing (LRS) using PacBio’s Sequel system. That decision paid off: The team found a large deletion overlapping a gene implicated in Carney Syndrome. Having found the genetic cause, Ramon is likely eligible for a new heart.

Published in 2017 in Genetics in Medicine, this research also offers hope that new genomic technologies will yield genetic information that has long been unavailable. “There is a lot of genetic variation that has been missed, and there are now technologies that can interrogate that,” said Jonas Korlach, Ph.D., chief scientific officer at PacBio. In addition to long-read sequencing, those technologies include (among others) linked-reads developed by 10X Genomics; and optical mapping, offered by Bionano Genomics.

10X Genomics’ Chromium Controller automates the partitioning and molecular barcoding of DNA samples to prepare them for Illumina sequencing and thereby obtain long-range information from short reads.

Missing Variation

“When genomics started, we hoped to have all of the answers and be able to cure diseases in the next 5 to 10 years,” said Anjana Narayanan, Ph.D., the linked-reads product manager at 10X Genomics. “That’s what the promise of the human genome project was: to end the diagnostic odyssey. That really hasn’t happened.”

In part, that’s because most genomic technology relies on short-read sequencing (SRS), she said, which provides reads of 100–600 base-pairs. SRS is good at finding single base-pair changes to the genome—single nucleotide polymorphisms, or SNPs—but it is not as good at finding larger structural variants (SVs) such as deletions, insertions, or inversions.

Researchers now know that SVs account for about two-thirds of all human genetic variation. “The community has only been looking at about a quarter of the variation,” Korlach said. “It’s therefore not surprising that diagnostic yield is only about 25 to 30 percent.”

Standard SRS also cannot reveal which short reads come from which parental chromosome. Analyses of SRS data are forced to average the two parental haplotypes, making it impossible to know whether an observed genetic change affects both chromosomes or only one—a key piece of information for discovering disease-causing genes.

Structural variants (SVs) represent approximately 60 percent of all human genomic variation, yet Illumina short-read sequencers do not reliably find them. PacBio uses long-read sequencing to reveal SVs in hopes of boosting the diagnostic yield from genetic testing. (Infographic Courtesy of Pacific Biosciences)

Going Long

One way to capture more SVs is to lengthen the stretch of base-pairs per read. That is the approach taken by PacBio with its single-molecule real-time (SMRT) sequencing technology, which can read segments of DNA averaging 10,000 base pairs. It’s right at the sweet spot for finding SVs, which are typically 50 to 1,500 bases long, said Lori Aro, senior director of clinical genomics at PacBio.

The hope is that LRS will greatly increase sequencing’s diagnostic capabilities and put an end to diagnostic odysseys such as Ramon’s. Beyond that, Korlach said, “there are myriad of application spaces that we’re pleased to see appear now.” These include molecular diagnostic applications not only for genetic disease but also for cancer and infectious diseases, as well as targeted sequencing for immunologic phenotyping.

The long-read approach is also particularly good at defining the extent of repetitive stretches of DNA, which are common culprits in neurodegenerative disorders, Aro noted. For example, in the FMRI gene on the X chromosome, most normal people carry a 30-unit CGG repeat with one or two AGG interruptions. By comparison, in people with Fragile X syndrome, one of the most common forms of inherited intellectual disability and autism, the CGG repeat is much longer (200–750 units). Researchers have used LRS to look at these repeats and found that the risk of Fragile X syndrome increases the more CGG repeats are present and fewer AGG interruptions.

Long reads also enable the separation of an individual’s two chromosomes. “Instead of both alleles collapsed into one, you actually have two sequence files, one for each chromosome,” said Tina Graves-Lindsay, leader of the reference genomes group at the McDonnell Genome Institute at Washington University in St. Louis. It’s an important step forward, she said. “If there’s a disease gene on one chromosome you won’t necessarily see it if you’ve done the compressed alleles.”

To date, one downside of long-read sequencing has been the cost. But that is coming down, Korlach says. Graves-Lindsay agrees: “Although PacBio has always been fairly expensive, at least compared with Illumina sequencing, they are having a ramp up with their new instrument, so it’s making sequencing cheaper.”

Linked Reads

What if, instead of designing a brand-new machine to sequence longer molecules, you simply found a way to add long-range information onto short-read data? That’s the approach taken by 10X Genomics. Their system dilutes a small part of the genome in an oil droplet in such a way that there are only a few longer molecules in each droplet. Then, they add a bead with unique barcode to each droplet so that each short read from the longer molecule in a droplet will have the same bar code, allowing reconstruction of the longer molecule.

The 10X approach is more affordable than LRS because it only involves changing sample prep rather than moving away from SRS altogether, Narayanan said. Another advantage is linked reads require only 1 nanogram of material, whereas LRS needs several micrograms. Moreover, like long reads, linked reads can resolve haplotypes. This is particularly valuable for autosomal recessive disorders (where a person inherits two copies of a defective gene, one from each parent), noted Narayanan. With linked-reads, “you can actually tease apart each parent’s chromosome,” she said.

That’s one reason researchers from the geographically isolated Faroe Islands have turned to 10X Genomics. They were concerned about an autosomal recessive disorder that kept cropping up on the island. Using 10X, they plan to sequence the genomes of the entire island’s population (50,000 people) to look at each person’s maternal and paternal copies of the problematic allele. It helps, Narayanan addeed, that 10X needs only a small blood prick to provide a complete picture—not just of SNPs but also of SVs—all in one assay.

Building a Genome Ladder

Bionano Genomics offers yet another way to find SVs. Rather than “fix sequencing with more sequencing,” said Sven Bocklandt, Ph.D., Bionano’s senior application specialist, the company builds a structural ladder of the genome that reveals large (greater than 1,000 base pair) structural changes. They accomplish this by starting with a long (up to 2.5 million base-pair), intact DNA molecule. To it, they attach a fluorescent dye to a six- or seven-base-pair sequence that recurs throughout the genome of every species. Running this molecule through a nano-channel array produces a scan of the molecule and its pattern of fluorescent labels. “Because the molecules are so long, and we have these labels every 6,000 bases in the genome, we end up with what looks like bar codes,” Bocklandt said. When these are aligned pairwise with one another and the barcode for the reference genome, SVs stand out.

Bionano’s approach is often considered complementary to SRS as well as PacBio and 10X sequencing, said Graves-Lindsay. And, Bocklandt added, it’s particularly good at detecting the repetitive sequences that make up two-thirds of our genome. Inversions and other variants inside repetitive sequences are particularly invisible to other approaches, he said.

In a recent case published in Genome Medicine last year, researcher Eric Vilain, M.D., Ph.D., of the Children’s National Health System in Washington, D.C., used Bionano technology to find both single and multiple exon deletions up to 250,000 base pairs in size, a 13,000 base-pair duplication, and a 5.1 million base pair inversion disrupting the dystrophin gene of several study patients with Duchenne muscular dystrophy. All of these would have been difficult to find using SRS or LRS. Vilain is continuing to use Bionano technology to further the work of the Undiagnosed Diseases Network study, an NIH-funded effort to solve the most challenging medical mysteries. In addition, a group at Penn State is using Bionano’s approach to simplify testing for SVs in leukemia patients. And Greenwood Genetics in South Carolina is using the approach to increase diagnostic yield throughout the state’s hospital system.

Finding the Clinical Sweet Spot

It may be some time yet before the PacBio, 10X, and Bionano technologies reach the clinic. “The fact that we have just begun to uncover this hidden aspect of variation in human genomes means it will take a while to create the equivalent databases,” Korlach notes. For example, ClinVar and SNPDb do not exist yet for SVs and are only now being established, he says.

To partially address that problem, PacBio is developing a joint SV caller to directly compare parental and child sequences when there is a high chance of a de novo SV in the child. “This will accelerate the rate of solved cases without waiting for the arrival of sophisticated databases,” Korlach says. And 10X is developing de novo genome assembly techniques that will be less reliant on such databases as well.

Ultimately, clinicians are agnostic to the technology that gets them the genetic information they need. “They want an accurate, reproducible answer,” Aro said. “It’s incumbent on the research and bioinformatics communities to provide technology that works—tools with the highest sensitivity and specificity for the task at hand.”

The PacBio Sequel System shown here is built on Single Molecule, Real-Time (SMRT) sequencing technology, also known as long-read sequencing (LRS).

This article was originally published in the March/April 2018 issue of Clinical OMICs. For more content like this and details on how to get a free subscription, go to www.clinicalomics.com.

Previous articleGut Microbiota Species May Protect against Typhoid
Next articleCancer Drug Resistance Predicted by CRISPR Screens