BTDT: Membership Inference Attacks against Large Language Models - MSc Thesis Defense by: Shadi Farokhghate

Tuesday, April 29, 2025 - 12:00

The School of Computer Science is pleased to present…

BTDT: Membership Inference Attacks against Large Language Models

MSc Thesis Defense by: Shadi Farokhghate

Date: Tuesday, April 29, 2025

Time: 12:00 pm

Location: Dillon Hall, Room 255

Abstract:

Given a machine learning model and a record, Membership Inference Attacks (MIAs) determine whether this record was used as part of the model’s training dataset. This can raise privacy issues. MIAs pose a significant threat to the privacy of machine learning models, particularly when the training dataset contains sensitive or confidential information. MIAs often take advantage of a model's tendency to overfit its training data, resulting in lower loss values for the training data than for non-training data. Recently, a new MIA against language models was designed that is based on a decision rule that compares the difference between the loss value of the target sample under the target model and the average loss of its neighbouring samples against a threshold. They generate neighbourhoods with simple word replacements that preserve the semantics and fit the context of the original word using Masked Language Models (MLMs). In this thesis, we propose Back Translation and Dynamic Thresholding (BTDT), a novel MIA. BTDT generates more realistic and diverse neighbour samples using back translation and introduces a dynamic thresholding mechanism, resulting in more adaptive and accurate membership inference. The results indicate that by employing dynamic thresholding, the attack's false positive and false negative rates can be effectively managed, thereby enhancing its robustness and efficiency.

Keywords: Membership Inference Attacks (MIAs), Large Language Models (LLMs), Back Translation

Thesis Committee:

Internal Reader: Dr. Dan Wu

External Reader: Dr. Mohammad Hassanzadeh

Advisor: Dr. Dima Alhadidi

Chair: Dr. Alioune Ngom

Vector Logo