Thursday, April 24, 2025 - 14:00
The School of Computer Science is pleased to present…
Cross-Domain Recommendations Via Aspect Sentiment Feature Extraction using Large Language Models (CRAS-LLM)
MSc Thesis Defense by: Muhammad Zohaib Zeeshan
Date: Thursday, April 24, 2025
Time: 2:00 pm
Location: Essex Hall, Room 122
Abstract:
Cross-domain recommendation systems (CDRS) are designed to suggest products from one domain to another by analyzing data (i.e., user reviews) across multiple domains, like movies and music. CDRS addresses data sparsity (i.e., insufficient data) and cold-start (i.e., no data) problems by capturing information from a source domain (i.e., movies) to a target domain (i.e., music) to enhance performance. Traditional CDRS employ various sentiment analysis techniques on user reviews to extract and map user preferences across domains. CRAS2024 only works with positive sentences within a review, discarding the negative sentences, causing information loss. It then uses Biterm Topic Model (BTM), a probabilistic topic modeling technique for extracting topics from short texts by analyzing how words co-occur, to extract aspects from the remaining positive sentences. It then uses GloVe (Global Vectors for Word Representation), a statistical word embedding method, but lacks contextual understanding, making it unable to differentiate words like “bank” in different contexts. RC-DFM2019 combines user reviews and item content (i.e., descriptions) with rating data using Stacked Denoising Autoencoders (SDAEs) to learn user preferences and item features. However, it struggles with sparse data in a domain where user has little, and it has cold-start issues if there are not enough shared users between domains. GA2021 (Graphical and Attention Framework) is a graph-based approach utilizing Node2Vec for graph embeddings and uses an element-wise attention mechanism to combine features from different domains. However, it struggles with sparse data. This thesis proposes Cross-Domain Recommendations via Aspect-Sentiment Feature Extraction using Large Language Models (CRAS-LLM), enhancing CRAS2024 that considers user reviews irrespective of their polarity. Unlike CRAS2024, aspects from user reviews are extracted using DeBERTa (Decoding-enhanced BERT with disentangled attention) by retaining both positive and negative aspects from user reviews based on their context. Then, SimCSE-BERT (Simple Contrastive Learning of Sentence Embeddings with BERT) generates contextual embeddings of the resultant aspects and is optimized using contrastive loss, bringing embeddings of similar sentiments closer and pushing those with dissimilar sentiments farther apart. Cross-Domain mapping is then performed on these embeddings using CycleGAN (Cycle Generative Adversarial Network), which maps these embeddings into target domain embeddings. Lastly, similarity scores between embeddings generated from product reviews of the target domain and mapped aspect embeddings from the source domain are calculated. Our experimental results across the Movie, Music, and Book domains, evaluated in four cross-domain settings (Movie→Music, Music→Movie, Book→Movie, and Music→Book) demonstrate that CDR-LLM consistently outperforms leading methods, including CRAS2024, PTUPCDR2022, and CATN2020, as well as other state-of-the-art approaches. CDR‑LLM reduces MSE by approximately 12.5%, while its precision improves by 15%, recall increases by about 16%, and the F1‑score is improved by nearly 11% relative to the baseline systems.
Cross-domain recommendation systems (CDRS) are designed to suggest products from one domain to another by analyzing data (i.e., user reviews) across multiple domains, like movies and music. CDRS addresses data sparsity (i.e., insufficient data) and cold-start (i.e., no data) problems by capturing information from a source domain (i.e., movies) to a target domain (i.e., music) to enhance performance. Traditional CDRS employ various sentiment analysis techniques on user reviews to extract and map user preferences across domains. CRAS2024 only works with positive sentences within a review, discarding the negative sentences, causing information loss. It then uses Biterm Topic Model (BTM), a probabilistic topic modeling technique for extracting topics from short texts by analyzing how words co-occur, to extract aspects from the remaining positive sentences. It then uses GloVe (Global Vectors for Word Representation), a statistical word embedding method, but lacks contextual understanding, making it unable to differentiate words like “bank” in different contexts. RC-DFM2019 combines user reviews and item content (i.e., descriptions) with rating data using Stacked Denoising Autoencoders (SDAEs) to learn user preferences and item features. However, it struggles with sparse data in a domain where user has little, and it has cold-start issues if there are not enough shared users between domains. GA2021 (Graphical and Attention Framework) is a graph-based approach utilizing Node2Vec for graph embeddings and uses an element-wise attention mechanism to combine features from different domains. However, it struggles with sparse data. This thesis proposes Cross-Domain Recommendations via Aspect-Sentiment Feature Extraction using Large Language Models (CRAS-LLM), enhancing CRAS2024 that considers user reviews irrespective of their polarity. Unlike CRAS2024, aspects from user reviews are extracted using DeBERTa (Decoding-enhanced BERT with disentangled attention) by retaining both positive and negative aspects from user reviews based on their context. Then, SimCSE-BERT (Simple Contrastive Learning of Sentence Embeddings with BERT) generates contextual embeddings of the resultant aspects and is optimized using contrastive loss, bringing embeddings of similar sentiments closer and pushing those with dissimilar sentiments farther apart. Cross-Domain mapping is then performed on these embeddings using CycleGAN (Cycle Generative Adversarial Network), which maps these embeddings into target domain embeddings. Lastly, similarity scores between embeddings generated from product reviews of the target domain and mapped aspect embeddings from the source domain are calculated. Our experimental results across the Movie, Music, and Book domains, evaluated in four cross-domain settings (Movie→Music, Music→Movie, Book→Movie, and Music→Book) demonstrate that CDR-LLM consistently outperforms leading methods, including CRAS2024, PTUPCDR2022, and CATN2020, as well as other state-of-the-art approaches. CDR‑LLM reduces MSE by approximately 12.5%, while its precision improves by 15%, recall increases by about 16%, and the F1‑score is improved by nearly 11% relative to the baseline systems.
Keywords: Aspect Opinion Mining, Sentiment Analysis, Cross-Domain Recommendations, Large Language Models, Data Sparsity, Cold Start
Internal Reader: Dr. Peter Tsin
External Reader: Dr. Christian Trudeau
Advisor: Dr. Christie Ezeife
Chair: Dr. Curtis Bright