Thursday, March 20, 2025 - 15:30
The School of Computer Science is pleased to present…
Data Augmentation for BERT Context-Based Question Answering: Improving No-Answer Detection through Context Replacement
MSc Thesis Defense by:
Muhammad Minhaj
Date: Thursday, March 20, 2025
Time: 3:30 pm
Location: Lambton Tower, Room 3105
Abstract: Question Answering (QA) in NLP powers chatbots, virtual assistants, and search engines by providing relevant answers. Context-based QA extracts answers from a given text along with the query. A key challenge is determining whether an answer is missing from the provided context. To address this, we fine-tune a model targeting this issue. Due to a lack of suitable datasets, we adopt data augmentation by generating new samples from real datasets. Existing methods involve adding or deleting words, replacing terms with synonyms, etc. We propose replacing the context in one sample with a similar context from another, selected based on a combination of BM25 and cosine similarity scores, ensuring the absence of the original answer in the new context. Our experiments show that fine-tuning with this augmentation increased BERT's precision from 2.70% to 96.18% in identifying missing-answer questions.
Keywords: QA, BERT, BM25
Thesis Committee:
Internal Reader: Dr. Muhammad Asaduzzaman
External Reader: Dr. Dan Xiao
Advisor: Dr. Jessica Chen
Chair: Dr. Xiaobu Yuan