Proteins spontaneously fold into intricate three-dimensional shapes which are key to nearly every biological process. But the complexity of protein shapes makes them difficult to study. Recently, progress has been made in protein structure prediction using deep neural networks. Now, a team of researchers investigates whether the information captured by such networks can generate new folded proteins with novel sequences—unrelated to those of the naturally occurring proteins used in training the models. The work describes the development of a neural network that “hallucinates” proteins with new, stable structures.

This research is published in Nature, in the paper, “De novo protein design by deep network hallucination.

“For this project, we made up completely random protein sequences and introduced mutations into them until our neural network predicted that they would fold into stable structures,” said Ivan Anishchenko, PhD, an instructor of biochemistry in the lab of David Baker, PhD, professor of biochemistry at the University of Washington School of Medicine Institute for Protein Design.

“At no point did we guide the software toward a particular outcome,“ Anishchenko said, “These new proteins are just what a computer dreams up.”

The research team generated two thousand new protein sequences that were predicted to fold. They then obtained synthetic genes encoding 129 of the network-“hallucinated” sequences, and expressed and purified the proteins in Escherichia coli; 27 of the proteins yielded species with properties such as circular dichroism spectra consistent with the hallucinated structures. Detailed analysis of the three-dimensional structures of three of the hallucinated proteins—two by X-ray crystallography and one by NMR—closely matched the hallucinated models, the authors noted.

In the future, the team believes it should be possible to steer artificial intelligence so that it generates new proteins with useful features. “We’d like to use deep learning to design proteins with function, including protein-based drugs, enzymes, you name it,” said Sam Pellock, PhD, a postdoctoral scholar in the Baker lab.

“Our NMR studies, along with X-ray crystal structures determined by the University of Washington team, demonstrate the remarkable accuracy of protein designs created by the hallucination approach”, said Theresa Ramelot, PhD, a senior research scientist at Rensselaer Polytechnic institute (RPI) Structural Bioinformatics Lab in Troy, NY.

“The hallucination approach builds on observations we made together with the Baker lab revealing that protein structure prediction with deep learning can be quite accurate even for a single protein sequence with no natural relatives,” noted Gaetano Montelione, PhD, professor of chemistry and chemical biology at RPI. “The potential to hallucinate brand new proteins that bind particular biomolecules or form desired enzymatic active sites is very exciting.”

“This approach greatly simplifies protein design,” said Baker. “Before, to create a new protein with a particular shape, people first carefully studied related structures in nature to come up with a set of rules that were then applied in the design process. New sets of rules were needed for each new type of fold. Here, by using a deep-learning network that already captures general principles of protein structure, we eliminate the need for fold-specific rules and open up the possibility of focusing on just the functional parts of a protein directly.”

“Exploring how to best use this strategy for specific applications is now an active area of research, and this is where I expect the next breakthroughs,” said Baker.