Researchers in the Graduate School of Advanced Science and Engineering at Waseda University, Japan, say they have introduced RaptGen, a variational autoencoder (VAE) that can be used for aptamer generation. VAE, a type of machine learning approach, has been reported to be beneficial in the discovery of other small molecules.
The scientists published their paper “Generative aptamer discovery using RaptGen” in Nature Computational Science and explain how RaptGen uses a VAE with a profile hidden Markov Model decoder to create latent spaces in which sequences can form clusters.
By using this latent representation, RaptGen was able to generate aptamers that were not included even in the original sequencing data or HT-SELEX dataset.
Aptamers are a type of oligonucleotide that can selectively bind to specific targets such as proteins, peptides, carbohydrates, viruses, toxins, metal ions, and live cells. As they are similar to antibodies, they have a variety of uses in the fields of biosensors, therapeutics, and diagnostics. As they are similar to antibodies, they have a variety of uses in the fields of biosensors, therapeutics, and diagnostics. However, compared to antibodies, aptamers do not induce an immune reaction in our bodies, and are easy to synthesize and modify. Moreover, an aptamer’s three-dimensional folding structure allows it to bind to a wider range of targets.
“Nucleic acid aptamers are generated by an in vitro molecular evolution method known as systematic evolution of ligands by exponential enrichment (SELEX). Various candidates are limited by actual sequencing data from an experiment. Here we developed RaptGen, which is a variational autoencoder for in silico aptamer generation,” write the investigators.
“RaptGen exploits a profile hidden Markov model decoder to represent motif sequences effectively. We showed that RaptGen embedded simulation sequence data into low-dimensional latent space on the basis of motif information. We also performed sequence embedding using two independent SELEX datasets. RaptGen successfully generated aptamers from the latent space even though they were not included in high-throughput sequencing. RaptGen could also generate a truncated aptamer with a short learning model.
“We demonstrated that RaptGen could be applied to activity-guided aptamer generation according to Bayesian optimization. We concluded that a generative method by RaptGen and latent representation are useful for aptamer discovery.”
“RaptGen first visualizes a latent space with a sequence motif, then generates multiple new aptamer sequences via this latent space,” says Michiaki Hamada, PhD, professor, in describing how RaptGen can boost aptamer discovery.
“For example, it searches for optimized aptamer sequences in the latent space by considering additional information after analyzing the activity of a subset of sequences. Additionally, RaptGen enables the design of shortened (or truncated) aptamer sequences.”
The team also evaluated RaptGen’s performance using real-world data, by subjecting it to data from two independent HT-SELEX datasets. RaptGen could generate aptamer derivatives in an activity-guided manner and provide opportunities to optimize their activities.
“This is important as it means that RaptGen can generate sequences having desired properties, such as the inhibition of certain enzymes or protein-protein interactions,” adds Hamada. “The application of these molecules could open many doors in the future.”
The scientists plan to conduct extensive studies evaluating if alternative models can improve the performance of RaptGen, and whether RaptGen could advance RNA aptamer generation by using RNA sequences. The only drawbacks in using RaptGen are the high computational cost and increased training time, both of which can be improved in further studies, according to Hamada.
“To the best of our knowledge, RaptGen is the only data-driven method that can design and optimize truncated aptamers directly from HT-SELEX data,” notes Hamada. “We believe that in due time, RaptGen will be recognized as a key tool for efficient aptamer discovery.”