Speaker: Pedro Morgado, PhD, UW-Madison Dept of Electrical and Computer Engineering
Date: Wednesday, July 12, 2023
Time: 10:00AM Central Time
Location: WIMR 2409 & https://uwmadison.webex.com/uwmadison/j.php?MTID=mdd2aa93545d6a8ca6714046094cf0947
Title: “Unifying Audio-Visual Machine Perception – Tasks & Architectures”
Abstract: Accurately recognizing, localizing, and separating sound sources is essential for effective audio-visual perception. Traditionally, these tasks have been approached independently, with separate methods developed for each. However, the interdependencies between source localization, separation, and recognition make it clear that independent models may yield suboptimal performance. To address this, our research focuses on unifying audio-visual learning tasks and architectures to integrate audio and visual cues for joint localization, separation, and recognition. In this talk, I will present our recent progress in this field. I will introduce a unified pretraining framework that enables simultaneous learning of audio-visual recognition, localization, and separation. Additionally, I will showcase a novel early fusion architecture that incorporates local audio-visual interactions, which can be efficiently pre-trained using an audio-visual masked autoencoding framework. The objective of unified pre-training of early fusion models is to replicate human-like multi-modal perception, promising a deeper and more sophisticated understanding of audio-visual interactions, crucial for these true multimodal applications. Throughout the talk, I will share a sequence of compelling findings that demonstrate the strong positive transfer between these tasks. Furthermore, I will highlight the substantial benefits that early audio-visual fusion can provide in enhancing model expressivity and consequently performance on challenging audio-visual applications.
Bio: Pedro is an Assistant Professor at the University of Wisconsin-Madison in the department of Electrical and Computer Engineering, and affiliated with the Computer Sciences department. His research interests are at the intersection of computer vision and machine learning, focusing on developing systems that continuously learn to perceive the world through multiple sensory modalities without direct human supervision. Prior to joining UW-Madison, he was a post-doctoral fellow at Carnegie Mellon University, working with Abhinav Gupta. He earned his Ph.D. degree from the University of California San Diego advised by Prof. Nuno Vasconcelos, and his B.Sc. and M.Sc. degrees from Universidade de Lisboa, Portugal.