Bill Lefew
William LeFew, PhD

Last month, Metabolon announced the close on $72 million in combined debt and equity financing. This round of financing included Perceptive Advisors as a new participant. EW Healthcare Partners and other existing investors also participated in the financing.

“The incremental funding will help accelerate our growth and expand our client base, in addition to helping further our research and development programs in machine learning to enable novel biomarker discovery and expand our precision medicine platform,” notes Rohan (Ro) Hastie, PhD, president and CEO, Metabolon.

According to William LeFew, PhD, Metabolon’s director of data science, the company uses machine learning to automate routine tasks and teachable processes so its scientists can focus their efforts on the challenges that most require their expertise.

“As part of our data science initiatives, we leverage machine learning to extract insight and recognize data patterns computationally,” he says. “Machine learning improves our data curation throughput, identifies biochemical signatures, and detects anomalies to accurately and rapidly ensure quality control. These capabilities improve coverage, quality, turnaround time, and, ultimately, our clients’ study success.”

LeFew provided three examples of machine learning at work at Metabolon.

Streamlined quality control

In the laboratory, time is always of the essence and attention to detail is paramount. Late-stage analysis revealing flaws in original data can invalidate days or weeks of work, derailing timelines and production deliverables.

“Machine learning,” LeFew explains, “enables us to provide clients with a compressed quality control cycle to detect failure modes much earlier than is possible with a purely manual process.”

For example, with several product lines, stringent requirements are tested via statistical analysis once curation is complete. A fully manual process would require several days of work. According to LeFew, the autocuration utility, combined with automated statistical analyses targeted at product requirements, makes it possible to detect failure modes instantly after initial raw data production, prompting immediate human review to classify samples that would likely fail quality control checks days later, reducing time and labor that would otherwise be spent on samples doomed to fail at a downstream quality control step.

“By detecting failure right after initial raw data review, we can run backup samples immediately,” he adds. “This saves a significant amount of time in our process, ultimately improving our total turnaround time for client projects while freeing up employee time to focus on shippable results rather than reruns.”

Faster data curation

In textbook machine learning, labeling problems are solved by learning a classification model from unbiased ground-truth data. In real applications, however, the matter may be significantly complicated by the practices and protocols used to produce the training data.

As an example, Metabolon generates liquid chromatography/mass spectrometry (LC/MS) data from which metabolites’ presence is inferred. Historically, expert curators examined these data with software assistance to confirm or deny the compounds’ presence. Each sample processed on Metabolon’s platforms, LeFew continues, is examined for the presence or absence of every one of the tier-one identified compounds in Metabolon’s metabolite knowledgebase. All samples processed on Metabolon’s Precision Metabolomics™ platform are curated against the company’s proprietary library of more than 5,200 unique metabolites.

“Machine learning allows us to achieve this same high-quality data, but much faster,” he asserts. “We can bring a data set directly to quality control through machine learning, saving time by automatically performing initial curation. Machine learning also allows us to quickly determine with certainty which compounds are present and the ones that were never present, significantly reducing or even eliminating the need for human experts to make these trivial decisions.”

Built on historical curations, machine learning feeds an autocuration utility that can curate many routine compounds. Consider cholesterol, says LeFew, which is readily found in human plasma and, therefore, is not an efficient use of staff expertise.

“With the support of machine learning tools, we can leverage our human expert curators’ skills in delving for the presence of ethylparaben sulfate, which often presents with interfering ions, or differentiating between compounds like isoleucylglycine and alanylvaline,” he details. “These compounds are not chromatographically separated but have distinct MS/MS fragmentation, usually containing a tremendous variety of information.”

Continued knowledge expansion

LeFew notes that there is still much to be discovered in terms of new metabolites and their impact on life sciences research and drug development.

“The body of literature, reports, and insights produced by Metabolon’s internal experts have served as the basis for a collaboration with data science to develop a shared vocabulary, a knowledgebase, and software to support the recording of continued knowledge expansion,” he maintains. “Future data science collaborations with these experts will automatically surface relevant historical knowledge to expert staff on each and every experiment run with Metabolon.”


William LeFew, PhD, is Metabolon’s director of data science. He leads a team that leverages machine learning to improve data curation throughput, identify biochemical signatures, and detect anomalies to accurately and rapidly ensure quality control.

Previous articleMetabolomics Tools Adapt to Meet Researchers’ Needs
Next articleThe True (If Circuitous) Path to Stem Cell Cures