How AI is revolutionizing biologics with better protein toxicity prediction

Published by: Raghvendra Mall
TII

Artificial intelligence is playing an increasingly important role across industries, and biologics and pharmaceuticals are no exception.

The UAE’s Technology Innovation Institute (TII) has recently released new research that leverages AI and machine learning to predict protein and peptide toxicity, a critical element in the development of new biologics.

TII’s Biotechnology Research Center (BRC) has created an advanced tool called the VISH-Pred framework, which constitutes an ensemble of fine-tuned protein language models that predict protein and peptide toxicity with remarkable precision and recall significantly exceeding those of existing models.

This pioneering work is another example of the transformative innovation and deep research and development capabilities coming out of Abu Dhabi.

What is protein toxicity?

Protein toxicity happens when the body can’t properly handle and get rid of the waste products that come from breaking down proteins. Identifying toxic proteins during drug development can present numerous challenges. Traditional methods are slow, costly, and inaccurate. 

But we found that leveraging advanced computational techniques to streamline the process presents a real opportunity. The proposed VISH-pred model is efficient in terms of prioritizing the right non-toxic protein candidates for therapeutics and filtering the most toxic protein candidates in the drug development process. 

As peptide- and protein-based therapeutics become increasingly promising treatment regimens for many diseases, predicting toxicity early can save time and costs, reduce side effects, and contribute to the development of safer, more effective medications. 

The role of technology 

AI-based protein language models, which help understand and predict how proteins work, were critical for the research, as well as combinations of machine learning techniques like LightGBM and XGBoost. High-performance computer resources were required for training these models on big datasets and fine-tuning them to attain state-of-the-art accuracy. 

This unique methodology can handle large amounts of imbalanced toxicity data accurately and efficiently. The algorithm also helps screen out dangerous protein candidates early in the drug development process, resulting in safer and more effective candidates for vaccines, immunotherapies and medications. 

But the research wasn’t without its set of challenges. One of the most difficult issues was dealing with the imbalance in the dataset, because there are far more non-toxic proteins that exist in nature than harmful toxic ones. Fine-tuning the large-scale models to make them accurate and efficient was also challenging. 

The work on VISH-Pred was conducted entirely in-house, leveraging the TII team's expertise in AI, protein, and enzyme design and toxicity assays. This approach allowed TII to maintain complete control over the research process and ensure the highest quality results. 

What’s next?

A provisional patent has been filed for the VISH-Pred framework but the model is already available as a user-friendly web server

In the immediate term, the goal is to further improve VISH-Pred to enhance usability. To that end, we plan to add additional data, boost model correctness, and provide more features to the web server.

Ultimately, we hope that VISH-Pred will become the standard instrument for predicting protein toxicity in scientific literature and pharmaceutical industry. We also envision VISH-Pred enabling the advancement of the study of computational toxicology and making a major contribution to the creation of safer protein-based pharmaceuticals.

Check out the research paper:

https://academic.oup.com/bib/article/25/4/bbae270/7688816