Metabolomics involves the comparison of metabolomes (the full metabolite complement of an organism) between control and test groups to find differences in their profiles. Those differences may be correlated to the disease being studied in biomarker discovery or to changes in metabolic output in toxicology studies when a drug candidate is introduced to a test subject. Environmental metabolomics is also growing in importance where studies are performed to assess chemical risks to wildlife and the environment or to monitor the maintenance of healthy livestock in intensive farming with respect to disease.
Unlike gene-expression studies or proteomics analyses, which only reveal part of what might be happening in a cell, metabolomic profiling can give an instantaneous snapshot of the entire physiology of that cell. More importantly, if data from proteomics, transcriptomics, and metabolomics can be integrated, a more complete picture of a living organism’s biology can be obtained.
There are usually several steps involved in metabolomics analysis:
• Profiling (also known as differential expression analysis) involves finding the interesting metabolites with statistically significant variations in abundance within a set of experimental and control samples.
• Identification is the determination of the chemical structure of these metabolites after profiling.
• Validation uses much larger numbers of samples to account for the effect of natural or biological variations to validate the previously identified metabolites. It is quantitative and requires analytical standards.
• Interpretation, the last step in the workflow, makes connections between the metabolites discovered and the biological processes or conditions.
Because of the vast chemical diversity of metabolites and their wide variation in abundance, metabolomics research usually requires multiple techniques; certain classes of samples are more amenable to one analysis technique than others (Figure 1).
The two most commonly used techniques are GC/MS and LC/MS. Comprehensive metabolomics labs frequently incorporate both of these approaches.
A major challenge of metabolomics involves data processing and analysis; a full range of software programs is needed to turn raw metabolomics data into useful biological results. A typical metabolomics experiment requires large numbers of samples to generate results that are statistically rigorous. Aside from the need for highly sensitive and accurate instrumentation, powerful software tools are essential to address the vast amounts of data generated by these experiments. Analytical capabilities include deconvolution programs for processing GC/MS and LC/MS files, an array of statistical analysis tools to find significant metabolites, a metabolite database to identify metabolites, and finally, bioinformatics software for visualizing molecular interaction networks.
Metabolomic Profiling and Identification Using LC/MS
A pilot collaborative metabolomics study was undertaken to determine whether metabolite biomarkers of infection and immunity in rice could be identified. Rice, a major food staple, is a model species in cereal genome research. Bacterial leaf blight of rice, caused by the Xanthomonas oryzae pv. oryzae bacteria, leads to crop losses of up to 50%.
Two rice lines, TP309 (susceptible, wild-type) and TP309-Xa21 (resistant, transgenic), and two bacterial strains, PXO99 (wild-type) and PXO99-raxST– (knock-out), were studied along with appropriate controls and biological replicates (Table).
A two-step LC/MS approach was employed. First, rapid profiling of samples was performed using an Agilent 1200 Series LC (www.agilent.com) and Agilent 6210 Time-of-Flight LC/MS. Agilent GeneSpring MS bioinformatics software was used to analyze the complex, multi-class data generated by the study. Second, subsequent targeted identification of differentially expressed metabolites was performed using an Agilent 6510 Quadrupole Time-of-Flight (Q-TOF) LC/MS. The Agilent METLIN Personal metabolite database was used to narrow the list of possible identities during the identification process.
Initial processing of the accurate-mass MS profiling data was done using Agilent MassHunter Software. The feature extraction and correlation algorithms in the MassHunter software located the groups of co-variant ions in each chromatogram. Each of these groups represented a unique compound and not just chromatographic peaks, which could have concealed multiple components.
The retention time/mass pairs generated by the MassHunter Workstation software were then exported for subsequent analysis in GeneSpring MS software (figure 2).
GeneSpring MS differs from other commercially available biomarker software packages in that it has an array of useful statistical analysis and visualization tools—including 1-way and 2-way ANOVA (analysis of variance), principal component analysis (PCA), and class prediction algorithms—all of which enable the discovery of biomarkers that can detect disease or drug toxicity. Unlike the typical pair-wise comparisons, in this example, seven experimental states were studied simultaneously.
PCA is a commonly used analysis tool for differential expression analysis. When PCA was performed on the rice data using GeneSpring MS with no prefiltering of data, separation of the TP309 and TP309-Xa21 rice lines was observed (Figure 3a). However, separation of the different classes of treatments containing information regarding immunity and defense was not observed. By first prefiltering the data (a metabolite must be present in all six biological replicates) and then performing 1-way ANOVA prior to PCA, the differences between the two rice lines became much clearer, and it also made it easier to distinguish infection status, regardless of rice line (Figure 3b).