Next-generation DNA sequencing methods have provided the tools to enable a revolution in plant biology. But plant genome sequencing remains highly challenging. According to plant scientists, the simple appearance of plants belies their extraordinary genomic complexity. Despite recent achievements in sequencing technology, plant genomes present obstacles that can make it difficult to execute large-scale population and phylogenetic studies on next-generation sequencing platforms.
These challenges include the repetitive nature of plant genomes, one considerable obstacle that has hindered reliable assembly of a complete plant genome. This, scientists say, is due to the high copy number and amplifying nature of transposable elements within a large number of plant genomes. But plant genomics scientists and instrument developers are finding ways around the challenges, and a lot of government and private investment currently supports plant genomic enterprises.
The human genome has as many genes (about 22,000) as some mosses and less than half the number of genes in alfalfa or apple. Plants like coastal pines and redwoods lug around 3.2 billion nucleotides in their genomes, making them 10 times larger than the human variety. And the additional genome within each plant cell that encodes genes required to execute photosynthesis further complicates genomic analysis.
The study of plant genomics impacts crop productivity, biodiversity, and climate change, and has become a priority to some funding institutions. The NSF said in 2011 that it would provide $5 million in 2012 to continue funding its Plant Genome Research Program (PGRP), an effort that has been under way since 1998 supporting plant genome biology research.
The $101.6 million that the National Science Foundation granted in 1998 for plant genome sequencing projects was distributed among 32 sequencing and functional genomics projects. These projects focused on analyzing gene function and interactions between genomes and the environment in crop plants including cotton, corn, rice, soybean, tomato, and wheat.
In its latest funding announcement, NSF said it expects to support 10 to 15 grants beginning in October 2012 for projects that pursue innovative ideas in basic research and tools development that will advance crop plant science and the plant biology realm in general.
The NSF stressed the need for development of new research tools, particularly for high-throughput phenotyping platforms, saying that it will give priority to development of new tools that may contribute broadly to the field of plant genomics. These include research to improve tools for genome sequence assembly and analysis, novel methods for high-throughput phenotyping, and improved data-visualization tools.
In 2008, shortly after the announcement of the Human 1000 Genomes Project, the University of Albert announced the 1000 Plant Genomes Project. A large-scale genomics enterprise intended to take advantage of the speed and efficiency of next-generation DNA sequencing, the project is headed by Gane Ka-Shu Wong and Michael Deyholos. The project aims to obtain the transcriptome (expressed genes) of 1,000 different plant species over the next few years.
While the stated purposes of the program include determination of the evolutionary relationships among the known plant species. The project focuses on plants with commercial potential. These plants produce valuable chemicals or secondary metabolites, with the hope that characterization of the involved genes will allow modification of the underlying biosynthetic processes.
And scientists say, similarly to the focus of the 1000 Plant Genomes Project, de novo transcriptome assembly has already provided an alternative approach to get around complex plant genomes.
In 2011, scientists at the Hainan Key Laboratory for Sustainable Utilization of Tropical Bioresources/Institute of BioScience and Technology, Haikou, China, reported the use of next-generation massively parallel sequencing and de novo transcriptome assembly to analyze the transcriptome of the only source of commercial natural rubber, Hevea brasiliensis.
The sequencing output, the scientists said, generated more than 12 million reads with an average length of 90 nt. In total 48,768 unigenes were assembled through de novo transcriptome assembly. Out of 13,807 H. brasiliensis cDNA sequences deposited in Genbank of the National Center for Biotechnology Information (NCBI) (as of February 2011), 11,746 sequences (84.5%) could be matched with the assembled unigenes through nucleotide BLAST.
The scientists said their data provides the most comprehensive sequence resource available for the study of rubber trees and demonstrates the effective use of Illumina sequencing and de novo transcriptome assembly in a species lacking genomic information.