Researchers at Aalto University and the University of Luxembourg report they have developed a new machine learning model that will help identify small molecules, with applications in medicine, drug discovery, and environmental chemistry.
Their findings, “Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data,” were published in the journal Nature Machine Intelligence.
“Structural annotation of small molecules in biological samples remains a key bottleneck in untargeted metabolomics, despite rapid progress in predictive methods and tools during the past decade,” wrote the researchers. “Liquid chromatography–tandem mass spectrometry, one of the most widely used analysis platforms, can detect thousands of molecules in a sample, the vast majority of which remain unidentified even with best-of-class methods. Here we present LC-MS2Struct, a machine learning framework for structural annotation of small-molecule data arising from liquid chromatography–tandem mass spectrometry (LC-MS2) measurements.”
“Even the best methods can’t identify more than 40% of the molecules in samples without making some additional assumptions about the candidate molecules,” explained Juho Rousu, PhD, professor of computer science at Aalto University.
The new approach may be able to identify metabolic disorders, such as diabetes, or even cancer.
“Our research shows that while absolute retention times may vary, the retention order is stable across measurements by different labs,” said Eric Bach, a machine learning & bioinformatics doctoral student at Aalto University. “This allowed us to merge all publicly available data on metabolites for the first time ever and feed it into our machine learning model.”
“The fact that using stereochemistry improved the identification performance is a revelation for all developers of metabolite identification methods,” said Emma Schymanski, PhD, associate professor at the Luxembourg Centre for Systems Biomedicine (LCSB) of the University of Luxembourg. “This method could also be used to help identify and trace micropollutants in the environment or characterize new metabolites in plant cells.”