Phrase books are well enough as far as they go, but even the best ones are severely abridged, which is to say, inadequate. Consider the phrase book that came out of the Human Genome Project. The first edition—the first draft of the human genome—generated a lot of excitement, and rightly so. Still, it was no more than a generalized genome, one that had been assembled from sequencing data representing just a few donors.
By itself, the first draft was a poor guide to the genomic diversity of 7 billion humans. Other limitations included a multitude of gaps and vast stretches of incomprehensible noncoding sequences.
Over the years, increasingly thorough editions have appeared. More important, they’ve been used in increasingly immersive contexts. Today, genomic scientists have sequenced nearly a million human genomes, and initiatives such as the international Encyclopedia of DNA Elements (ENCODE) consortium have started charting the complex network of regulatory and other noncoding sequences.
In genomic studies, scientists have started moving beyond stock phrases. They’ve been picking up the genomic equivalents of syntax and grammar. Consequently, they’re getting better at recognizing and even putting together novel expressions. Achieving yet greater proficiency depends on making the most of new genomic technologies. Combined, these technologies promise to usher in an era in which scientists will not only read the genome, but also become fluent in its underlying language.
One in a billion
All the cells in our body get essentially the same set of instructions, but they find wildly different ways to interpret them. Even seemingly homogeneous tissues can contain myriad cell types with distinctive functions and biological characteristics. “This is what cells are—machines that translate genotypes into phenotypes,” says Alex K. Shalek, PhD, a systems biologist at the Massachusetts Institute of Technology. To understand the mechanisms informing cellular identity and behavior, Shalek’s group has joined a growing community of researchers making use of sophisticated technologies that enable in-depth analysis of thousands of individual cells in a single experiment.
Genomic differences between individual cells can be extremely meaningful in tumors, but in most cases offer limited insight into the cellular heterogeneity that occurs within tissues. To study this kind of heterogeneity, single-cell biologists are shifting their attention to transcriptomic differences. Enabling this shift is a proliferation of commercial platforms for sequencing and quantifying the messenger RNA produced from thousands of genes in parallel.
Transcriptomic data can reveal specific physiological and biochemical processes that are active in a cell, but also offer a distinctive fingerprint for recognizing unique cell types. To illustrate this point, Shalek cites research that identified a rare lung cell type associated with cystic fibrosis. (The work was carried out by Aviv Regev, PhD, a computational biologist and Broad core institute member, and Jayaraj Rajagopal, MD, a Broad associate member and a physician in the Pulmonary and Critical Care Unit at Massachusetts General Hospital.) “Now you have an entirely different way of thinking about how to treat the disease,” he points out.
Transcriptomic experiments are often performed with dissociated individual cells, an approach that sacrifices essential spatial information. Several researchers have therefore devised methods that can spatially map gene expression with single-cell resolution—or even subcellular detail—within tissue samples.
At the California Institute of Technology, professor of biology and biological engineering Long Cai, PhD, has demonstrated that the seqFISH technique can be used to physically map and quantify expression of more than 10,000 mRNAs at once. “We’re trying to generate spatial maps of tissue development at single-cell resolution, looking at cell fate decisions and transformations,” says Cai. Such techniques could also guide disease research—for example, Shalek’s team is exploring how interactions between tuberculosis bacteria and immune cells throughout the lung affect the course of infection.
Other groups are developing sophisticated single-cell “multi-omic” experiments that couple transcriptome data with assays that detect protein levels or assays that chart epigenetic marks associated with the regulation of gene expression. The collective result is a richer portrait of cell biology than researchers could have previously conceived, and initiatives like the Human Cell Atlas now aim to codify these findings into cellular field guides.
“A lot of us think of it as sort of a ‘Human Genome 2.0,’” declares Shalek. “The idea is to have a parts list and understand the relationship among those parts—where they are in the body, which states they can assume, and how they modify disease.”
Precision perturbations
No exploration of contemporary genomic research would be complete without mention of CRISPR-Cas9, a powerful tool for targeted DNA sequence manipulation that has captured the public imagination. Some of the greatest excitement surrounds medical applications, and the past year saw the launch of a handful of CRISPR clinical trials.
In February, CRISPR Therapeutics and Vertex Pharmaceuticals began dosing patients with bone marrow stem cells in which the defective gene responsible for sickle-cell disease had been repaired. And last summer, Editas Medicine and Allergan embarked on the first-ever in vivo study of CRISPR, administering genome editing machinery directly into the eyes of patients with a hereditary form of blindness.
“We don’t have evidence yet that it has been effective in treating disease,” says Charles A. Gersbach, PhD, director of the Duke University Center for Advanced Genomic Technologies. “But there are a couple of studies in small numbers of patients that indicate that ex vivo therapy is safe—so, that’s exciting.”
At present, CRISPR tools are relatively blunt instruments. They modify genomes via a somewhat sloppy process called non-homologous end joining (NHEJ). It is good for inactivating defective genes but not correcting mutations.
“When we do gene editing, the dirty secret is that we get a lot of different types of outcomes,” confides Gersbach. “Having more uniformity and predictability is going to be important.”
One solution entails CRISPR approaches that cleanly swap donor sequences of interest into target sites of interest in the genome, achieving consistent editing via a molecular mechanism called homology-directed repair (HDR). At Harvard University, a research team led by professor of chemistry and chemical biology David R. Liu, PhD, is developing another CRISPR-related technique: base editing. This approach, which “nicks” DNA and converts an adenine to an inosine, or a cytosine to a thymine, can achieve more predictable and targeted modifications.
Yet another strategy is the newly developed prime editing technique. Like base editing, prime editing nicks DNA. Prime editing systems, however, also incorporate a reverse transcriptase that catalyzes the transfer of genetic information into a host cell’s genome. Prime editing has the potential to introduce the same range of manipulations as HDR while achieving greater efficiency and reducing the risk of genomic disruption.
Gersbach is especially enthusiastic about CRISPR as a tool for mapping genome function. “CRISPR,” he predicts, “is going to allow us to figure out what’s going on with the 6000 genes [that still have no known function] and the 2 million putative regulatory elements identified by projects like ENCODE.” This exploration can be facilitated by pooled library screens, where cells are subjected to CRISPR manipulation with a large collection of guide RNAs that target the editing machinery to sites of possible interest throughout the genome.
These assays are not simply using CRISPR to snip or swap DNA; Gersbach’s group and others also employ modified Cas9 enzymes that transform this nuclease into a protein that selectively activates or suppresses expression at genomic regions of interest. This allows more nuanced manipulation in a manner that better mimics real gene regulation.
“The combination of genome sequences from large populations of individuals combined with high-throughput CRISPR screens is going to transform how we understand our genome and how we take advantage of that to inform therapy,” states Gersbach.
Design-build-test-learn pipelines
The ultimate demonstration of understanding a system is the ability to reconstruct or repurpose it. Researchers have been using specialized enzymes to cut, paste, and otherwise manipulate segments of DNA since the 1970s. But only in the past decade or so has our understanding of genome structure and function started to reach a level of sophistication that will allow us to treat living organisms, or even communities of organisms, as systems that incorporate standardized modules. The engineering and manipulation of these modules is known as synthetic biology.
Pamela Silver, PhD, a professor of biochemistry and systems biology at Harvard Medical School, notes that the nascent field of synthetic biology has benefited from plummeting costs of DNA synthesis. This allows the kind of rapid trial-and-error prototyping that is essential for rewiring biological systems, given the field’s still-incomplete understanding of how best to reprogram cells.
“Rather than testing one strain at a time and having a graduate student slave away at trying to build one circuit, it’s now more cost effective to have as much synthesized as possible,” she says. “This means you can also stretch your imagination much more.”
Some of the most advanced work in this space relates to metabolic engineering, in which cellular genomes are reconfigured to express enzymatic assembly lines for manufacturing useful chemical products. For example, at Amyris, a renewable products company, researchers are using engineered yeast to manufacture the antimalarial drug artemisinin.
Yeast cells that may serve as drug manufacturing platforms are also being developed by scientists at Stanford University. The scientists, led by Christina Smolke, PhD, a professor of bioengineering, have demonstrated the ability to reprogram these organisms to produce the opioid painkiller hydrocodone and other pharmaceuticals. Such applications may avert drug shortages and cut manufacturing costs.
Synthetic biologists have also embarked on ambitious efforts to rewrite entire genomes. The most advanced work to date has been done in bacteria. For example, at ETH Zurich, a research team led by systems biologist Beat Christen, PhD, recently demonstrated the computational design and construction of an entire bacterial genome.
Parallel efforts are underway to achieve similar feats in yeast and human cells, namely, the Synthetic Yeast Genome Project (Sc2.0) and the Genome Project-Write (GP-Write). “I’ve been passionate about building a completely artificial human chromosome where you know every detail about it and it’s fully programmable,” declares Silver, who is part of GP-Write.
Made-to-order genomes could be transformative for agriculture, medicine, environmental remediation, and other applications—but they also require strong safety countermeasures. Accordingly, much of today’s effort is focusing on designing synthetic genomes with “recoded” sequences that could make them invulnerable to pathogens but also incapable of exchanging genetic material with natural organisms.
Silver thinks these precautions will be essential in a future that will almost inevitably be reliant upon synthetic cells. “We have to embrace [the idea of] releasing engineered organisms into the environment to save the world,” she insists. “People are just going to have to get with that program.”