Metabolite Identification in Metabolomics

Systems biology has emerged in the previous decade to provide a holistic study of the biochemical components (genes, transcripts, proteins, and metabolites) and their complex interactions that create the emergent properties or phenotype of biological systems. Metabolomics is a core discipline of systems biology and involves the investigation of low molecular weight organic and inorganic metabolites present in a cell, tissue, organ, or organism.

Apr15_2010 Drug Discovery Tutorial Figure 1
Figure 1. The discovery and validation study workflow: Independent studies are performed to identify metabolites that are shown to be biologically interesting in both discovery and validation studies. Only these metabolites are passed forward for chemical identification.

Metabolomics provides a dynamic and representative phenotypic picture of the system involving endogeneous and exogeneous metabolism and biochemical regulation (e.g., allosterism and riboswitches).

Metabolomic studies are a multistage process involving hypothesis-generation discovery and validation studies. The workflows for experiments and metabolite identification are shown in Figures 1 and 2. Many metabolomics studies start from a point of limited biological knowledge.

A holistic experiment is designed and performed to acquire robust and valid data encompassing a wide and diverse range of metabolites and metabolic pathways. Data is interrogated to define metabolic differences between classes, for example, two classes of subjects diagnosed/not diagnosed with a disease. This strategy is defined as metabolic profiling.

Apr15, 2010 Drug Discovery Tutorial Fig2
Figure 2. The metabolite identification workflow provides putative or definitive identification.

Experimental design is essential to ensure biases (for example, age, gender, drugs) between classes being studied are not present and the use of quality control samples is highly recommended to allow the quantitation of technical and biological variation.

The human metabolome is estimated to contain greater than 7,800 metabolites although many metabolites related to drug, lipid, and gut microflora metabolism are currently not accurately described. Powerful analytical technologies are required to fulfill the objectives of metabolic profiling. Chromatography-mass spectrometry platforms are routinely applied in metabolic profiling, including ultra high performance liquid chromatography (UHPLC) coupled to hybrid mass spectrometry instruments. These offer reproducible detection of thousands of features from a given sample due to advanced chromatographic and mass resolution.

The power of UHPLC coupled to the hybrid mass spectrometer will be described in this article. Specifically the application of the Thermo Scientific LTQ Orbitrap mass spectrometer (Thermo Fisher Scientific), coupled with UHPLC, to the role of chemical identification of metabolites will be discussed.

Identification with LTQ Orbitrap

The goal of metabolomics is to convert raw analytical data into biological knowledge, and to complete this process the chemical identification of metabolites is essential. However, a major limitation in metabolomics currently is chemical identification of metabolites. Unlike proteomics there are limited metabolite databases and in silico workflows for high-throughput and automated identification.

Hybrid mass spectrometers are essential in the strategy to provide putative (not matched to an authentic chemical standard) and definitive (matched to an authentic chemical standard) chemical identification of metabolites. The Metabolomics Standards Initiative (MSI) has defined appropriate strategies for the reporting of metabolite identification.

The first step employs the accurate mass to determine all potential molecular formulae related directly to the accurate mass determined. This is a difficult task as the process of electrospray ionization creates multiple features (a chromatographic peak defined with an accurate mass and retention time) for each metabolite.

Protonated and deprotonated ions combined with chemical adduct ions (Na, K, formate, sodium formate, and many others) and fragmentation ions are detected and introduce complexity in to the data. Therefore, the number of features detected is higher than the number of metabolites present. There is a significant opportunity to introduce false positive and false negative identifications (for example, assuming a sodium adduct ion is a protonated ion).

Operation with high mass accuracies limits the number of potential molecular formula. Routine mass accuracies of less than 3 ppm across wide metabolite concentration ranges are observed with the Orbitrap mass spectrometer and allow significant reduction in the number of potential molecular formula. These molecular formula are subsequently searched against metabolomic (e.g., HMDB, KEGG, Manchester Metabolomics Database) or chemical (e.g. PubChem, Chemspider, ChEBI) databases to list all potential metabolites matching to one or more molecular formula.

However, even sub-ppm mass accuracy will not provide the separation of metabolites with a similar or identical accurate mass as Kind and Fiehn have reported previously. These include stereoisomers like glucose-6-phosphate and fructose-6-phosphate. A second step is required to provide further experimental data for chemical identification or to reduce the number of potential metabolites defined in the first process. Isotopic data and the application of specific rules (e.g., Lewis and Senior check, isotope ratios) are two processes used to reduce the number of probable metabolites.

Collision induced dissociation (CID) of metabolites to acquire structural information is also performed and is only routinely possible with hybrid instruments. In the LTQ Orbitrap system, the linear ion trap is employed to perform MSn experiments where n can be greater than two. Most other hybrid instruments, including the quadrupole-TOF, only offer MS2.

The ability to acquire MSn data provides significantly greater confidence in the chemical identification. This is only achievable with ion-trap based instruments and it presents MSn spectral trees. This provides a fragmentation mass spectrum of the pre-cursor (parent) ion and fragmentation mass spectra for each of the product (daughter) ions in a repetitive manner. This can significantly increase the power to discriminate between metabolites of similar accurate mass and chemical structure.

If the product ion mass spectra of two metabolites are similar, further fragmentation can provide mass spectral differences associated with fragmentation of product ions.

Where authentic chemical standards are not available to compare the fragmentation mass spectra of standards with the experimentally derived mass spectrum, further assistance is required to reduce the number of potential metabolites. Here, the Mass Frontier software can be applied to perform in silico fragmentation of all potential structures defined during process one. These theoretical mass spectra can be compared to the experimentally derived mass spectrum to provide a unique identification or to reduce the number of potential metabolites.

At the Manchester Centre for Integrative Systems Biology, UHPLC is coupled to the LTQ Orbitrap hybrid instrument to profile microbial (yeast, E. coli) and mammalian metabolomes (human biofluids, tissues, culture supernatants). Data is acquired in a two-stage process.

First, accurate mass data is acquired for all samples at high mass resolution (R>30,000) and mass accuracy (less than 3 ppm) so to maximize mass resolution and accuracy for narrow chromatographic peaks observed. Data-dependent MSn experiments are performed at the end of each batch for a random selection of samples, acquired with lower mass resolution (R=7,500) so to maximize the collection of multiple MSn mass spectra with shorter scan times.

These processes have been applied in a range of high-throughput clinical studies involving the discovery of novel biomarkers or pathophysiological mechanisms in heart, bowel, and kidney diseases and diabetes.

The function of high specification hybrid mass spectrometer instruments coupled with UHPLC in metabolic profiling is essential in the field of systems biology. They provide the routine, robust, and reproducible detection of 100–1,000s of metabolites in small or epidemiological-size studies. However, one of the current and major problems in metabolomics is metabolite identification.

Mass spectral libraries transferable between instruments and laboratories are not available. Authentic chemical standards for many metabolites are not commercially available. More importantly, informatics workflows for automation of metabolite identification processes are not available. Significant advances in these areas for metabolite identification are essential for metabolomics to become a routinely applied research tool in microbial, plant, and mammalian studies.


Warwick Dunn, Ph.D., is an experimental officer at The Manchester Centre for Integrative Systems Biology.