Data Augmentation for BERT Context-Based Question Answering: Improving No-Answer Detection through Context Replacement - MSc Thesis Defense by: Muhammad Minhaj

Thursday, March 20, 2025 - 15:30

The School of Computer Science is pleased to present…

Data Augmentation for BERT Context-Based Question Answering: Improving No-Answer Detection through Context Replacement

 

MSc Thesis Defense by:
Muhammad Minhaj

 

Date: Thursday, March 20, 2025

Time: 3:30 pm  

Location: Lambton Tower, Room 3105

 

Abstract: Question Answering (QA) in NLP powers chatbots, virtual assistants, and search engines by providing relevant answers. Context-based QA extracts answers from a given text along with the query. A key challenge is determining whether an answer is missing from the provided context. To address this, we fine-tune a model targeting this issue. Due to a lack of suitable datasets, we adopt data augmentation by generating new samples from real datasets. Existing methods involve adding or deleting words, replacing terms with synonyms, etc. We propose replacing the context in one sample with a similar context from another, selected based on a combination of BM25 and cosine similarity scores, ensuring the absence of the original answer in the new context. Our experiments show that fine-tuning with this augmentation increased BERT's precision from 2.70% to 96.18% in identifying missing-answer questions.

 

Keywords: QA, BERT, BM25

 

Thesis Committee:

Internal Reader: Dr. Muhammad Asaduzzaman  

External Reader: Dr. Dan Xiao    

Advisor: Dr. Jessica Chen

Chair: Dr. Xiaobu Yuan

Vector Logo

Registration Link (only MAC students need to pre-register)