Glycans are oligosaccharide and polysaccharide molecules found attached to proteins in virtually all multicellular organisms. They serve essential roles in biological processes as diverse as protein folding, immune function, and energy storage. For example, glycans on the surface of cells are known to mediate interactions between cells and to define cell identities within complex tissues.
Very specific glycan structures have been shown to control the activities of the proteins to which they are attached, adding a post-transcriptional, post-translational layer of regulation onto a protein’s function. In some instances, the function of specific cell-surface signaling molecules is meditated by an exact glycan structure at a precise location on the protein.
Consequently, very small changes in the glycan structure of a biologically based drug can significantly affect its function. And, certain glycans have been shown to exhibit disease-related changes in expression levels, and thus have the potential to serve as markers for disease.
Because of the emerging importance of glycans as biomarkers and their impact on biologically based therapeutics, workflows that simplify and expedite glycan structural elucidation are of intense interest.
The complex branching and isomeric nature of glycans create significant analytical challenges for glycobiologists attempting to characterize them. In addition to characterizing the glycan sequence, the researcher must elucidate branching, linkages between monosaccharide units, and the locations of sulfate and phosphate groups.
Due to its ability to analyze glycan mixtures at low levels in complex matrices, mass spectrometry (MS) has emerged as one of the most powerful techniques for glycan structural elucidation. Multistage/sequential mass spectrometry (MSn) capability aids in discriminating between glycans that have similar fragment ions at the MS/MS level and resolves their carbohydrate distribution and branching patterns. Up to six or seven levels of fragmentation may be needed to differentiate between glycan structural isomers.
However, MS-based workflows generate large volumes of spectral data that, until now, have required hours or even days of tedious manual interpretation to decipher. The lack of bioinformatics tools to simplify, expedite, and automate the elucidation of glycan structures has been the single largest bottleneck in MS-based workflows.
Interpretation of Glycan Spectra
Recent innovations in bioinformatics software have enabled a MS-based workflow (Figure 1) that accelerates glycan characterization through automated interpretation of the MSn data produced by ion trap and hybrid ion trap Orbitrap™ mass spectrometers (Thermo Fisher Scientific). This data is essential to resolving glycan heterogeneity and isomeric forms unequivocally.
As shown in Figure 1, MS/MS spectral fragmentation data is acquired on the ion trap or Orbitrap mass spectrometer and then imported into SimGlycan® software (Premier Biosoft). SimGlycan software automatically matches the experimentally acquired mass spectra against its database of theoretical glycan fragments, and generates a list of candidate glycan structures.
The SimGlycan database is a relational database containing 22,456 glycans, 22,814 glycoproteins, 11,438 glycans with known biological sources, 11,918 glycans with known classes, 263 biochemical reactions, 194 biochemical pathways, 250 glycan-related enzymes and 22,265 other database links. The database is continuously updated as new information on glycans is published.
Each proposed glycan structure is assigned a rank and a score to reflect how closely it matches the experimental data. The rank is based on calculating the proximity score, which is a numerical representation of how closely the experimental properties of the glycan, such as composition and branching pattern, match with those of the glycans in the database.
To help the researcher evaluate these matches, SimGlycan software highlights the experimental m/z values matching those of the theoretical spectra. To aid in visualization, each match can be represented using cartoons or Domon-Costello nomenclature. Cartoons are constructed using standard Consortium for Functional Glycomics (CFG) nomenclature for monosaccharides (custom symbols are used where CFG annotations are not available).
To aid in interpretation of overall results, additional relevant biological information about the proposed glycan structure (for example, glycan class and biological pathway) are available via interactive links.
Figure 2 shows an example of the results obtained at this stage of the workflow—at the MS/MS level. Though structural interpretation can be performed using MS/MS-level spectral data, it’s very difficult if not impossible to determine structural isomers without MSn-level spectral data. Examination of the glycan list generated by SimGlycan software often reveals additional glycan compositions having identical mass that are scored much lower.
In some cases, when the experimental spectrum is mapped onto the theoretical spectrum, peaks showing high abundance remain unexplained. Though reported to have a much lower probability of matching the MS/MS spectrum, these glycans could be additional isomers present in the sample.
To characterize a lower-ranked glycan structure, sets of sequential MSn spectra are acquired for unmatched fragment ions. Each successive level of MSn fragmentation spectra is then brought into SimGlycan software to compare the experimental and predicted MSn fragmentation pathway. The software generates an annotated spectrum depicting the fragmentation matches and loss of consecutive monosaccharide units. As the level of MSn increases, the fragments generated become increasingly structure specific and thus aid in characterizing specific structures, isomers, and branching patterns. Figure 3 shows the MS7 fragmentation pathway for the glycan (Gal)(Man)4(GlcNAc)4.
If the MS/MS database search does not resolve a glycan structure, SimGlycan software allows the researcher to draw, edit, and “theoretically” fragment glycan structures. Theoretical fragmentation of the drawn structure assists the researcher in comparing experimental and theoretical data in order to determine whether the modification brings the theoretical glycan closer to the experimental data. This assists in resolving glycan structures when data for the glycans is not yet available in the software database or for de novo glycan analysis.
Because glycans perform essential roles in biological processes and have the potential to act as markers for disease, they are of intense interest. However, the complex branching and isomeric nature of glycans make them difficult to characterize. Though mass spectrometry, and in particular MSn, has emerged as a powerful technique to aid in glycan structural elucidation, MS-based workflows generate enormous amounts of spectral data that required tedious manual interpretation. Recent developments in bioinformatics software now enable an MS-based workflow that dramatically accelerates glycan characterization by facilitating interpretation of MSn data.