Comparative Analysis of Large Language Models for Polypharmacy Side Effect Prediction - MSc Thesis Proposal by: Sadra Hakim

Tuesday, June 4, 2024 - 13:00

The School of Computer Science is pleased to present…

Comparative Analysis of Large Language Models for Polypharmacy Side Effect Prediction

MSc Thesis Proposal by: Sadra Hakim

 

Date: Tuesday, 4 June 2024

Time: 1:00 PM

Location: Odette School of Business, Room OBB03

 

Abstract:

Polypharmacy, the concurrent use of multiple drugs, is a common strategy to treat patients with complex diseases or co-existing conditions. However, combining drugs can lead to unintended interactions, increasing the risk of adverse side effects. Large language models (LLMs) have demonstrated considerable promise in fields such as biology and medicine. In this study, we propose a novel approach to predict polypharmacy side effects by using LLMs to encode the chemical structure drugs. Our methodology utilizes the Decagon dataset, a standard resource for polypharmacy side effect prediction. Specifically, we used BERT to embed the chemical structures of drugs, represented as SMILES strings. Additionally, we utilized a BERT which is fine-tuned using the Simple Contrastive Learning of Sentence Embeddings (SimCSE) approach on SMILES data to enhance its ability to generate high-quality embeddings for chemical compounds. After experimenting with various LLMs to obtain embeddings, we investigated the impact of using an autoencoder on enhancing the predictive capabilities of LLMs. The results of this study highlight the potential of LLMs in tackling complex pharmaceutical challenges and offer insights into safer medication use. Through extensive experimentation, we compared different LLMs and demonstrated their effectiveness in accurately predicting multiple polypharmacy side effects.

 

Thesis Committee:

Internal Reader: Dr. Jianguo Lu

External Reader: Dr. Andrew Swan

Advisor: Dr. Alioune Ngom

Vector Institute Logo

MAC STUDENTS ONLY - Register here