MSc Thesis Defense "No Query Left Behind: Query Refinement via Backtranslation" By: Delaram Rajaei

Wednesday, August 7, 2024 - 13:00

The School of Computer Science is pleased to present…

No Query Left Behind: Query Refinement via Back Translation

MSc Thesis Defense by: Delaram Rajaei

Date: Wednesday, August 7th, 2024

Time:  1:00 PM -2:30 PM

Location: Essex Hall Room 122

 

Abstract:

Query refinement is to enhance the relevance of search results by modifying users' original queries to refined versions. State-of-the-art query refinement models have been trained on web query logs, which are predisposed to topic drifts. To fill the gap, little work has been proposed to generate benchmark datasets of (query à refined query) pairs through an overwhelming application of unsupervised or supervised modifications to the original query while controlling topic drifts. In this paper, however, we propose leveraging natural language back-translation, a round-trip translation of a query from a source language via target languages, as a simple yet effective unsupervised approach to scale up generating gold-standard benchmark datasets. Back translation can (1) uncover terms that are omitted in a query for being commonly understood in a source language, but may not be known in a target language (e.g., figs à (Tamil) அத்திமரங்கள் à the fig trees), (2) augment a query with context-aware synonyms in a target language (e.g., Italian Nobel prize winners à (Farsi) برنده های ایتالیایی جایزه نوبل à italian Nobel laureates), and (3) help with the semantic disambiguation of polysemous terms and collocations (e.g., custer's last stand à (malay) pertahan terakhir Custer à custer's last defence). Our experiments across 5 query sets with different query lengths and topics and 10 languages from 7 language families using 2 neural machine translators validated the effectiveness of query backtranslation in generating a more extensive gold-standard dataset for query refinement. We open-sourced our research at https://github.com/fani-lab/RePair/tree/nqlb.

 

Keywords: Query Reformulation, Back translation, Gold-Standard Generation

 

Thesis Committee:

Internal Reader: Dr. Jianguo Lu  

External Reader: Dr. Tanja Collet-Najem

Advisor: Dr. Hossein Fani

Chair:    Dr. Dan Wu

Vector Logo