PhD Dissertation Defense by: Ala Alam Falaki

Thursday, May 23, 2024 - 10:00

The School of Computer Science is pleased to present…

Efficient and Enhanced Text Summarization by Compressing and Data Augmentation for Transformers-Based Models

PhD Dissertation Defense by: Ala Alam Falaki

 

Date: Thursday, 23 May 2024

Time:  10:00 am

Location: Essex Hall, Room 122

 

Abstract:
Automatic text summarization is becoming increasingly important due to the explosion of information on the internet, which necessitates methods for quickly distilling essential information. The potential to save time and improve information retention is particularly valuable in academic and professional settings where staying current with new research and publications is crucial. However, the field faces several challenges, such as the subjectivity of summary evaluations and the need for domain-specific datasets to improve summary quality and faithfully reflect original content.
Current research in text summarization includes experimenting with neural network architectures to make them more efficient and accessible. Our efforts to reduce models' size without sacrificing performance have led to innovations like employing autoencoders in the encoder-decoder framework, resulting in smaller and faster models that use less computational resources. These advancements improve the summarization process and extend to other tasks, such as translation and classification, demonstrating the versatility of these optimized models.
Furthermore, the issue of limited, high-quality data sets, especially in specialized domains, remains a significant hurdle, impacting the effectiveness of these models. So, in addition to architectural improvements, we explored data augmentation techniques to enhance the training of summarization models. Over 60 experiments have been conducted to evaluate new and existing augmentation methods, analyzing their impact on the quality and novelty of summaries. These studies use advanced models like GPT-4 to ensure summaries maintain relevance, consistency, fluency, and coherence. The research has yielded positive results, including guidelines for effectively employing different augmentation strategies based on specific goals, such as increasing novelty or improving sentence quality. This ongoing research has also led to the development of open-source tools aimed at assisting researchers in visualizing model attention mechanisms and evaluating summary quality, further contributing to the field's development and accessibility.
 
Keywords:
Data Augmentation; Compression, Automatic Text Summarization; Natural Language Processing.
 
Doctoral Committee:
Internal Reader: Dr. Luis Rueda
Internal Reader: Dr. Dan Wu      
External Reader: Dr. Jonathan Wu
External Examiner: Dr. Enrique Alegre-Gutiérrez              
Advisor(s): Dr. Robin Gras
Chair: TBD
 

Vector Institute Logo

MAC STUDENTS ONLY - Register here