Artificial Intelligence Method Builds in Error for Better Models

By Jessica Kent

October 19, 2020 - An emerging method builds error and uncertainty into artificial intelligence models, ultimately leading to more efficient, precise tools in many areas of research – including healthcare.

Researchers at the University of Delaware and the University of Massachusetts Amherst discuss the new approach in a paper published in the journal Science Advances. The method aims to teach AI and machine learning models to integrate and organize information from a range of different sources, resulting in more trustworthy calculations.

Because AI doesn’t know if information is missing, or whether the data it draws on is incorrect, these tools can’t deal precisely with random events or uncertainty. The new mathematical framework combines data, expert knowledge, multiscale knowledge, and information theory through uncertainty quantification.

The method provides a powerful way to analyze data, study materials and complex interactions, and tweak errors virtually instead of in a lab.

“Traditionally in physical modelings, we build a model first using only our physical intuition and expert knowledge about the system,” said Joshua Lansford, a doctoral student in UD’s Department of Chemical and Biomolecular Engineering.

“Then after that, we measure uncertainty in predictions due to error in underlying variables, often relying on brute-force methods, where we sample, then run the model and see what happens.”

The paper describes how the approach works in a chemical reaction called the oxygen reduction reaction, but the method is applicable to all kinds of modeling, the researchers noted.

“The chemistries and materials we need to make things faster or even make them possible — like fuel cells — are highly complex. We need precision. And if you want to make a more active catalyst, you need to have bounds on your prediction error. By intelligently deciding where to put your efforts, you can tighten the area to explore,” said Lansford.

“Uncertainty is accounted for in the design of our model. Now it is no longer a deterministic model. It is a probabilistic one.”

The emerging approach has already demonstrated its potential in the healthcare field. Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) recently incorporated the framework into machine learning algorithms to identify drug compounds that work against tuberculosis.

The team pointed out that while the method has previously been used by computer scientists, it could also prove useful in protein design and other fields of biology.

“This technique is part of a known subfield of machine learning, but people have not brought it to biology. This is a paradigm shift, and is absolutely how biological exploration should be done,” said Bonnie Berger, the Simons Professor of Mathematics and head of the Computation and Biology group in MIT’s CSAIL.

In the study, the machine learning models were able to analyze training data and consider how reliable those predictions are. For example, if the data going into the model predict how strongly a particular molecule binds to a target protein, as well as the uncertainty of those predictions, the model can use that information to make predictions for protein-target interactions it hasn’t seen before.

The model can also estimate the accuracy of its own predictions, the group said. When analyzing new data, the model’s predictions may have lower certainty for molecules that are very different from the training data. Researchers can use that information to help them decide which molecules to test experimentally.

Additionally, with this approach, algorithms require only a small amount of training data. In their study, MIT researchers trained the model with a dataset of 72 small molecules and their interactions with more than 400 proteins called protein kinases.

They were then able to analyze nearly 11,000 small molecules, many of which were very different from those in the training data. With this approach, researchers were able to identify molecules with very strong predicted binding affinities for the protein kinases they put into the model.

The team also used the same training data to train a traditional machine learning algorithm, which does not incorporate uncertainty, and then had the model analyze the same 11,000 molecule library.

“Without uncertainty, the model just gets horribly confused and it proposes very weird chemical structures as interacting with the kinases,” said MIT graduate student Brian Hie, the paper’s lead author.

Even with a small amount of data, models that incorporate uncertainty can improve, the team stated.

“You don’t really need very large data sets on each iteration,” Hie said. “You can just retrain the model with maybe ten new examples, which is something that a biologist can easily generate.”

The results of the MIT study could help drug developers create better treatments for tuberculosis.

“We’ve now provided them with some new leads beyond what has been already published,” said Bryan Bryson, an assistant professor of biological engineering at MIT and a member of the Ragon Institute of MGH, MIT, and Harvard.

Analytics in Action News

Artificial Intelligence Method Builds in Error for Better Models

The approach builds uncertainty, expert knowledge, and missing data into calculations, leading to more accurate artificial intelligence models.

Next in Analytics in Action