The School of Computer Science at the University of Windsor is pleased to present…
Advancing Multimodal Vision-Language Learning

Date: Friday, March 21, 2025
Time: 11:00 am - 12:00 pm
Location: Erie Hall, Room 3123
Over the last decade, multimodal vision-language (VL) research has seen impressive progress. We can now automatically caption images in natural language, answer natural language questions about images, retrieve images using complex natural language queries and even generate images given natural language descriptions. Despite such tremendous progress, current VL research faces several challenges that limit the applicability of state-of-the-art VL systems. Even large VL systems based on multimodal large language models (MLLMs) such as GPT-4V and Gemini struggle with counting objects in images, understanding spatial relationships between objects, and lack sufficient geo-diverse cultural understanding. In this talk, I will present our work on studying and tackling the following challenges in VL research: visio-linguistic compositional and fine-grained reasoning, geo-diverse cultural understanding, and building reliable and safe multimodal models.
Aishwarya Agrawal is an Assistant Professor in the Department of Computer Science and Operations Research at the University of Montreal. She is a Canada CIFAR AI Chair and a core academic member of Mila -- Quebec AI Institute. She also spends one day a week at Google DeepMind as a Research Scientist. From Aug 2019 - Dec 2020, Aishwarya was a full-time Research Scientist at DeepMind. Aishwarya completed her PhD in Aug 2019 from Georgia Tech, working with Dhruv Batra and Devi Parikh.
Aishwarya's research focus is on multimodal AI research, specifically vision-language research, spanning various themes such as image-to-text and text-to-image generative models, visio-linguistic representation learning, compositional and fine-grained reasoning, parameter and data-efficient learning, robust automatic evaluation, geo-diverse cultural understanding, and reliable, explainable and safe multimodal AI.
Aishwarya is a recipient of a Canada CIFAR AI Chair Award, a Young Alumni Excellence Award from IIT Gandhinagar (her alma mater), and a Georgia Tech Sigma Xi Best Ph.D. Thesis Award, a Georgia Tech College of Computing Dissertation Award, a Google Fellowship (declined), a Facebook Fellowship (declined) and an NVIDIA Graduate Fellowship. Aishwarya was one of the two runners-up of the 2019 AAAI / ACM SIGAI Dissertation Award. Aishwarya was also selected for the Rising Stars in EECS 2018.