Plant genomes have a repetitive nature and can be up to 10 times larger than the human genome, posing unique challenges for scientists who study them. [© dvande - Fotolia.com]
Next-generation DNA sequencing methods have provided the tools to enable a revolution in plant biology. But plant genome sequencing remains highly challenging. According to plant scientists, the simple appearance of plants belies their extraordinary genomic complexity. Despite recent achievements in sequencing technology, plant genomes present obstacles that can make it difficult to execute large-scale population and phylogenetic studies on next-generation sequencing platforms.
These challenges include the repetitive nature of plant genomes, one considerable obstacle that has hindered reliable assembly of a complete plant genome. This, scientists say, is due to the high copy number and amplifying nature of transposable elements within a large number of plant genomes. But plant genomics scientists and instrument developers are finding ways around the challenges, and a lot of government and private investment currently supports plant genomic enterprises.
The human genome has as many genes (about 22,000) as some mosses and less than half the number of genes in alfalfa or apple. Plants like coastal pines and redwoods lug around 3.2 billion nucleotides in their genomes, making them 10 times larger than the human variety. And the additional genome within each plant cell that encodes genes required to execute photosynthesis further complicates genomic analysis.
The study of plant genomics impacts crop productivity, biodiversity, and climate change, and has become a priority to some funding institutions. The NSF said in 2011 that it would provide $5 million in 2012 to continue funding its Plant Genome Research Program (PGRP), an effort that has been under way since 1998 supporting plant genome biology research.
The $101.6 million that the National Science Foundation granted in 1998 for plant genome sequencing projects was distributed among 32 sequencing and functional genomics projects. These projects focused on analyzing gene function and interactions between genomes and the environment in crop plants including cotton, corn, rice, soybean, tomato, and wheat.
In its latest funding announcement, NSF said it expects to support 10 to 15 grants beginning in October 2012 for projects that pursue innovative ideas in basic research and tools development that will advance crop plant science and the plant biology realm in general.
The NSF stressed the need for development of new research tools, particularly for high-throughput phenotyping platforms, saying that it will give priority to development of new tools that may contribute broadly to the field of plant genomics. These include research to improve tools for genome sequence assembly and analysis, novel methods for high-throughput phenotyping, and improved data-visualization tools.
In 2008, shortly after the announcement of the Human 1000 Genomes Project, the University of Albert announced the 1000 Plant Genomes Project. A large-scale genomics enterprise intended to take advantage of the speed and efficiency of next-generation DNA sequencing, the project is headed by Gane Ka-Shu Wong and Michael Deyholos. The project aims to obtain the transcriptome (expressed genes) of 1,000 different plant species over the next few years.
While the stated purposes of the program include determination of the evolutionary relationships among the known plant species. The project focuses on plants with commercial potential. These plants produce valuable chemicals or secondary metabolites, with the hope that characterization of the involved genes will allow modification of the underlying biosynthetic processes.
And scientists say, similarly to the focus of the 1000 Plant Genomes Project, de novo transcriptome assembly has already provided an alternative approach to get around complex plant genomes.
In 2011, scientists at the Hainan Key Laboratory for Sustainable Utilization of Tropical Bioresources/Institute of BioScience and Technology, Haikou, China, reported the use of next-generation massively parallel sequencing and de novo transcriptome assembly to analyze the transcriptome of the only source of commercial natural rubber, Hevea brasiliensis.
The sequencing output, the scientists said, generated more than 12 million reads with an average length of 90 nt. In total 48,768 unigenes were assembled through de novo transcriptome assembly. Out of 13,807 H. brasiliensis cDNA sequences deposited in Genbank of the National Center for Biotechnology Information (NCBI) (as of February 2011), 11,746 sequences (84.5%) could be matched with the assembled unigenes through nucleotide BLAST.
The scientists said their data provides the most comprehensive sequence resource available for the study of rubber trees and demonstrates the effective use of Illumina sequencing and de novo transcriptome assembly in a species lacking genomic information.
Flow cytometry has also been applied for determining nuclear DNA contents (C- values) of plants. This method provides a rapid, accurate, and simple means to determine nuclear genome sizes. David Galbraith, Ph.D., professor of plant sciences in the College of Agriculture and Life Sciences and director of the BIO5 Chemical Genomics and Translational Research Laboratory at the University of Arizona, developed a flow cytometric method in 1983 for measuring genome sizes.
“Originally, the method used single-cell suspensions of plants, but the application was somewhat limited due in part to the fact that it’s not always possible to make cell suspensions. That’s why we came up with the idea of using cell homogenates for the flow cytometric measurements,” according to Dr. Galbraith. Since then, Dr. Galbraith and his colleagues have shown that these methods are consistently reliable and useful, and they are now routinely in use, worldwide, for determining plant C-values and ploidy.
Dr. Galbraith says that the C-value provides an extremely useful parameter in a number of applications in basic and applied plant biology, including serving as a starting point for whole-genome sequencing projects. He further notes that it facilitates characterization of plant species within natural and agricultural settings. He also notes that it facilitated ready identification of non-euploid engineered plants, or those that represent desired ploidy classes, and points toward studies concerning the role of the C-value in plant growth and in responses to the environment, as well as in terms of evolutionary fitness.
But despite the fundamental ease of flow cytometry in analyzing plant C-values, Dr. Galbraith says that these values have been determined for only around 2% of the described angiosperm species. Since plant species are increasingly being driven to extinction by anthropogenic influences, he thinks that the time is now right for a complete and comprehensive census of plant C-values across all angiosperms, to be followed by whole-genome sequencing.
Richard Cronn, Ph.D., of the USDA Forest Service, and colleagues at Oregon State University, Brigham Young University, and Linfield College published an overview of newly developed, up-and-coming DNA sequencing techniques as one of a series of articles in a Special Issue on Methods and Applications of Next-Generation Sequencing in Botany. Writing in the American Journal of Botany, Dr. Cronn and co-authors summarized “targeted enrichment” strategies to obtain specific DNA sequences from plant genomes.
Dr. Cronn’s research focuses on evaluating the pattern of molecular divergence of plants across various taxonomic groups—in particular, pines and other species of conifers. Conifers pose unique challenges because of the size of their genomes (up to 10 times the size of the human genome) and their great genomic diversity.
He says that DNA markers have proven invaluable, “allowing us to provide a quick determination of what a species is, and how species move across the landscape. We work to define the limits of the species and how their genetic variation is distributed on the landscape, and genomic markers provide a good way to predict variation.”
Dr. Cronn says that the scale of genomic characterization offered by next-generation sequencing and targeted enrichment are of great practical importance to land-management agencies.
“When we manage trees on a landscape we want to know that they are correctly identified, and that their geographic source and adaptive characteristics are known. DNA markers can be adapted to answer these questions.” He also explained that another critical element of genomic characterization relates to forensics.
“Timber theft, believe it or not, is a huge problem worldwide,” he says. “In some instances, timber theft contributes to the decline in threatened tree species; in others, timber theft can lead to degradation of critical habitat for endangered animals.”
Also, he explains, trees provide many “great examples” of local adaptation to climate. “In the Pacific Northwest, for example, we have examples of foresters trying to improve yield by moving warm-adapted trees into colder environments. Rare climatic events, like an unusually early or late freeze, often damage nonlocal trees while local trees remain healthy. We are trying to understand the genomic basis of this kind of climatic adaptation.” And, he adds, if we understand the genes that respond to climate change or local weather, “It will help us improve our ability to choose adapted sources of trees for reforestation.”
Genome reduction methods like those detailed in the American Journal of Botany Special Issue are crucial to these studies. “One of the biggest challenges in land plants is their genetic and genomic redundancy. We understand homology in small genomes and even the complex human genome. Most plants have many copies of genes that encode nearly identical products, so our understanding of homology becomes limited, and it is difficult to attribute gene function to a locus. I think this is a plant-specific challenge.”
Based on their efforts the authors found, after a direct comparison of methods across several applications, that PCR-based enrichment is a reasonable strategy for accessing small genomic targets (e.g., ≤50 kbp), but that hybridization and transcriptome sequencing scale more efficiently if larger targets are desired.
They concluded that while the benefits of targeted sequencing are greatest in plants with large genomes, nearly all comparative projects can benefit from the improved throughput offered by targeted multiplex DNA sequencing, particularly as the amount of data produced from a single instrument approaches a trillion bases per run.
Given the economic, ecological, and climatic importance of plants, scientists will continue to discover ways to facilitate their genome characterization. The NSF, it says, will continue to support the “genomics empowered” plant research to tackle fundamental questions in plant and agricultural sciences on a genome-wide scale, and the development of tools and resources for plant genome research including novel technologies and analysis tools that will enable discovery.