Optimizing LLMs: Harnessing Core Sub-Models for Efficient Training on New Tasks

Friday, June 7, 2024 - 12:00

The School of Computer Science is pleased to present…

MSc Thesis Defense by: Prithvi Rao Muthineni

Date: 07 June 2024

Time: 12:00 pm

Location: Essex Hall, Room 122

Abstract: The continuous evolution of natural language processing (NLP) has been pivotal in advancing AI's capacity to comprehend and decode human language. Large Language Models (LLMs) such as BERT, GPT, and RoBERTa exemplify this progress, setting new benchmarks across a spectrum of NLP tasks, including sentiment analysis. However, the practical deployment of such models encounters significant computational obstacles owing to their intricate architectures, demanding substantial processing resources and memory. Moreover, concerns about the environmental repercussions of training large-scale NLP models have gained prominence, accentuating the imperative for sustainable AI development to mitigate carbon emissions associated with training processes.

This thesis addresses the computational and practical impediments associated with deploying large-scale NLP models, focusing on the optimization of RoBERTa and Electra. The research endeavors to optimize these models through novel model pruning techniques and strategic fine-tuning, guided by the notion of a core sub-model embedded within trained LLMs. This core sub-model encapsulates generic language properties that are prevalent across various NLP tasks, serving as a foundational framework for fine-tuning on new tasks. Leveraging this core sub-model enables a streamlined fine-tuning process that targets only a subset of parameters, thereby enhancing efficiency while preserving or improving NLP task performance.

Thesis Committee:

Internal Reader: Dr. Olena Syrotkina

External Reader: Dr. Mohammad Hassanzadeh

Advisor: Dr. Robin Gras

Chair: Dr. Dan Wu

Vector Institute logo

Optimizing LLMs: Harnessing Core Sub-Models for Efficient Training on New Tasks - MSc Thesis Defense by: Prithvi Rao Muthineni

Optimizing LLMs: Harnessing Core Sub-Models for Efficient Training on New Tasks

MAC STUDENTS ONLY - Register here