Patricia F. Fitzpatrick Dimond Ph.D. Technical Editor of Clinical OMICs President of BioInsight Communications

Success of sequencing capabilities is dependent on tools to analyze the data.

On April 14, 2003, the International Human Genome Sequencing Consortium announced the completion of the Human Genome Project (HGP). The initiative, led in the U.S. by the NHGRI and the Department of Energy, took 13 years and cost $3 billion to produce a finished sequence that was 99% documented to an accuracy of 99.99%. But as Eric S. Lander, Ph.D., head of the MIT-based 125-person sequencing team that produced 28% of the public-sector sequence data, commented, “The human genome sequence was not and is not the answer to biomedical research—it is the foundation.”

Now, using the latest sequencing technology, a human genome can be read in eight days at a cost of about $10,000. And rapid, relatively affordable, and accessible sequencing technology has arrived in the laboratories of scientists working on everything from fruit flies, the original staple organisms for geneticists, to the genetic basis of multiple human diseases.

The Stowers Institute’s Hawley lab and molecular biology facility, for example, developed a whole-genome sequencing approach to mapping mutations in fruit flies. Scott Hawley, Ph.D., believes that it will change the way fruit fly genetics is done. Dr. Hawley and his colleagues mapped a fruit fly mutation caused by ethyl methanesulfonate (EMS) by determining the DNA sequence of the mutant fly’s genome. The results, they say, provide insight into the mechanism of EMS mutagenesis and gene conversion events involving balancer chromosomes, genetic constructs, or chromosomal rearrangements that permit lethal or sterile mutations to be stably maintained.

“The traditional mapping method could take months to years depending on the complexity of the phenotype,” said Karen Staehling-Hampton, Ph.D., managing director of molecular biology. “This advance will allow us to map mutations of interest in just a few weeks. The next-generation sequencing technology used for this project is extremely exciting. It will allow researchers to sequence genomes for a few thousand dollars, a cost unheard of just a few years ago.”

Whole-Genome Sequencing

As plans move ahead to sequence the genomes of anything that moves, what exactly have people found out that’s new and different with whole-genome sequencing? James Lupski, M.D., Ph.D., a physician-scientist who has a neurological disorder called Charcot-Marie-Tooth (CMT) disease, for one, found the genetic cause of his disease after 25 years of searching by sequencing his entire genome.

While a number of human genome sequences have been published to date, Dr. Lupski’s research is the first to show how whole-genome sequencing can be used to identify the genetic cause of an individual’s disease. It describes the process of analyzing thousands of potentially functional gene variants to eventually find the responsible mutations.

The gene responsible for this disorder, DHODH, located on chromosome 16q22, was previously associated with CMT. It encodes dihydroorotate dehydrogenase, which catalyzes the oxidation of dihydroorotate to orotate, the fourth enzymatic step in de novo pyrimidine biosynthesis required to produce two of the component bases for nucleic acids. The protein is normally located on the outer surface of the inner mitochondrial membrane. In fruit flies, a mutation in this gene, originally described in 1910, causes wing anomalies, defective egg production, and malformed posterior legs.

David Wheeler, Ph.D., of the Human Genome Sequencing Center, Baylor College of Medicine, and his colleagues reported the DNA sequence of James Watson, Ph.D., two years ago. It was sequenced to 7.4-fold redundancy using massively parallel sequencing in picoliter-size reaction vessels. Comparison of the sequence to the reference genome led to the identification of 3.3 million SNPs, of which 10,654 cause amino-acid substitution within the coding sequence.

It took Dr. Wheeler’s team barely two months and approximately one-hundredth the cost of traditional capillary electrophoresis methods. As the first genome sequenced by next-generation technologies, the investigators said that it provides “a pilot for the future challenges of personalized genome sequencing.”

And in the first attempt ever to sequence the genomes of an entire family, the Institute for Systems Biology (ISB) partnered with Complete Genomics to sequence the genomes of a father, mother, and two children. Both children suffer from the recessive genetic disorders: Miller syndrome, a rare craniofacial disorder, and primary ciliary dyskinesia (PCD), a lung disease. By sequencing the entire family the researchers were able to reduce the number of candidate genes associated with Miller syndrome to four.

The investigators said that family-based whole-genome sequencing allowed them to delineate recombination sites precisely, identify 70% of the sequencing errors (resulting in >99.999% accuracy), and identify very rare SNPs. They also directly estimated a human intergeneration mutation rate of ~1.1 x 10-8 per position per haploid genome. This meant that by sequencing the entire family, the researchers could see how much the genome changes from one generation to the next. In this case the gene mutations from parent to child occurred at half the most widely expected rate.

David Galas, Ph.D., ISB  svp of strategic partnerships, told GEN that given the sequencing technology explosion over the past five years,  “the information we will be able to get will be staggering. What we don’t know how to do yet is how to interpret the data and figure out how to analyze complex genetic traits. The important thing to realize is that as the changes continue and it becomes more powerful, analysis will require a systems approach integrating  biology, computation, and technological development, enabling scientists to analyze all elements in a biological system rather than one gene or protein at a time.”

He further said that “Genetics has been just looking at genotypes and phenotypes and at complex statistical correlations.  We are just beginning to understand how to use the biology to interpret genetics in a powerful way.”

Genetic Analysis of Diseases

In May of this year, Genentech scientists announced that they had sequenced and compared a patient’s primary lung tumor and adjacent normal tissue. The investigators said that although previous studies have identified common somatic mutations in lung cancers, the studies focused on a limited set of genes and provided only a constrained view of the mutational spectrum.

Comparing the two genomes, the scientists identified a variety of somatic variations including >50,000 high-confidence single nucleotide variants. They validated 530 somatic single nucleotide variants in the patient’s tumor for an overall 17.7 per megabase genome-wide somatic mutation rate. They noted a distinct pattern of selection against mutations within expressed genes compared to nonexpressed genes and in promoter regions up to five kilobases upstream of all protein-coding genes as well as a higher rate of amino acid-changing mutations in kinase genes. The authors say that this is the first evidence of distinct selective pressures present in the tumor environment.

And in June, Life Technologies reported the inception of The Genomic Cancer Care Alliance to “help people battling cancer gain access to treatment options found through analysis of their genomic information.” Founding partners include Fox Chase Cancer Center, Scripps Genomic Medicine, and the Translational Genomics Research Institute (TGen).

The alliance will launch a pilot study aimed at determining whether whole-genome sequencing can better guide treatment decisions across a number of difficult-to-treat cancers. US Oncology is expected to serve as the contract research and site-management organization for the study.

The initiative broadens a breast cancer study announced by Life Technologies in collaboration with TGen and US Oncology in March to sequence the genomes of 14 patients afflicted with triple-negative breast cancer whose tumors have progressed despite multiple other therapies. The company says the goal of this first-of-its-kind research collaboration is to demonstrate whether genomic sequencing of cancer tissue can provide clues for treatment strategies for these individuals.

But scientists caution that the rampant enthusiasm over technological developments they call explosive that have enabled the wide adoption of whole-genome sequencing should be tempered by the realization that meaningful interpretation of mountains of new data will require improved databases of variants and vastly improved tools for inferring the functional effects of novel variants.

Patricia F. Dimond, Ph.D., is a principal at BioInsight Consulting. Email: [email protected].

Previous articlePharmaceutical Research and Manufacturers of America
Next articleThermo Fisher Scientific Collaborates to Advance Biomarker Research in Japan