Analytics in Action News

AI Protein Design Software Trained to Generate Medicines, Vaccines

An artificial intelligence software developed to design proteins may also be used to create medicines, vaccines, and cancer treatments.

four paperclips arranged in a plus shape on a gray background. the paperclips, clockwise from the top, are yellow, pink, green, and blue.

Source: Getty Images

By Shania Kennedy

- Researchers at Harvard and the University of Washington School of Medicine (UW Medicine) have developed an artificial intelligence (AI) software that uses deep learning to design proteins with various functions, some of which could be used in the creation of medicine, vaccines, and medical treatments.

Protein design, also known as protein engineering, is a process by which researchers create proteins with enhanced or novel functional properties. These engineered proteins have various uses, but many are used in medical research to design protein-based vaccines, such as some COVID-19 vaccines, or medical treatments for conditions like cancer.

In the past, researchers have used computers to try to design proteins for research, but this is a difficult process, according to the press release. Because a single protein molecule can contain thousands of bonded atoms, proteins created in a lab are difficult to engineer and study.

These challenges spurred the research team to look for alternative solutions. Inspired by machine learning and its ability to generate images from prompts, the researchers set out to build a similar algorithm for protein design.

“The idea is the same: neural networks can be trained to see patterns in data. Once trained, you can give it a prompt and see if it can generate an elegant solution. Often the results are compelling — or even beautiful,” said lead author Joseph Watson, a postdoctoral scholar at UW Medicine.

The researchers trained their neural networks using data pulled from the Protein Data Bank, a public protein structure repository. From these networks, the researchers developed two approaches for designing proteins with new functions.

The first, labeled “hallucination,” is similar to other AI tools that generate new outputs based on simple inputs. The second, called “inpainting,” is like the autocomplete feature found in the search bar of online search engines. The networks “hallucinate” a new protein using a method that is similar to how an author writes a book, the press release states.

“You start with a random assortment of words — total gibberish,” explained lead author Jue Wang, a postdoctoral scholar at UW Medicine. “Then you impose a requirement such as that in the opening paragraph, it needs to be a dark and stormy night. Then the computer will change the words one at a time and ask itself ‘Does this make my story make more sense?’ If it does, it keeps the changes until a complete story is written.”

Using this analogy the press release continues, proteins can be thought of as long sequences of letters, and these “letters” correspond to an amino acid. To engineer a protein, the software takes a random amino acid chain and mutates the sequence repeatedly until a sequence that encodes the desired function(s) is created. The resulting proteins can then be manufactured and studied in the lab.

While the “inpainting” approach can be used to fill in missing pieces of an already-existing protein structure, allowing researchers to start with the key features they want in the final protein, and allow the software to fill in the rest.

Lab testing of both the “hallucinate” and “inpainting” methods showed that the proteins generated functioned as intended. Some of the novel proteins were found to bind to the anti-cancer receptor PD-1, while others had potential to become a vaccine for respiratory syncytial virus (RSV), a common virus that can cause bronchiolitis or pneumonia in vulnerable populations.

Despite these successes, the researchers noted in the press release that additional testing, including in animals, is needed for any protein-based treatment, vaccine, or medicine generated using this technology.