Core Concepts
The FAME Challenge 2024 aims to explore the impact of multiple languages on the task of associating faces and voices, which has important applications in various real-world scenarios.
Abstract
The FAME Challenge 2024 focuses on investigating the effect of language information on the task of face-voice association. This is an important research direction as half of the world's population is bilingual, and people often communicate in multilingual scenarios.
The challenge utilizes the Multilingual Audio-Visual (MAV-Celeb) dataset, which contains video and audio recordings of 154 celebrities speaking in three languages: English, Hindi, and Urdu. The dataset covers a wide range of unconstrained, challenging multi-speaker environments, including political debates, press conferences, outdoor interviews, and more.
The challenge setup involves a cross-modal verification task, where the network is evaluated on heard and completely unheard languages. The baseline method employs a two-stream pipeline to obtain face and voice embeddings, and a fusion and orthogonal projection (FOP) mechanism to learn discriminative joint face-voice embeddings.
The challenge encourages participants to explore novel ideas to improve performance on heard and unheard languages, with the evaluation metric being the equal error rate (EER). The challenge also includes a progress phase and an evaluation phase, with specific submission guidelines and important dates.
The FAME Challenge 2024 aims to provide a common platform for academic and industrial researchers to develop and explore the impact of languages in face-voice association, which can be useful for various downstream tasks.
Stats
The MAV-Celeb dataset contains 2 splits: English-Urdu (V1-EU) and English-Hindi (V2-EH), with a total of 154 celebrities. The dataset includes 957 videos (V1-EU) and 1130 videos (V2-EH), spanning over 84 hours of audio-visual data.
Quotes
"As half of the population of world is bilingual and we are more often communicating in multilingual scenarios [11], therefore, it is essential to investigate the effect of language for associating faces with the voices."
"The FAME Challenge 2024 is planned with the primary objective to provide a common platform to academic and industrial researchers to develop and explore the impact of languages in face-voice association, which can be useful for various downstream tasks."