Ego-Exo4D is a novel dataset that captures skilled human activities from both first-person (egocentric) and third-person (exocentric) perspectives. The dataset was collected by a consortium of 15 research institutions and features 740 participants performing 43 activities across 8 domains (e.g., cooking, sports, music, healthcare) in 123 unique real-world scenes and 13 cities worldwide.
The dataset offers 1,286 hours of synchronized ego and exo video, along with rich multimodal data including audio, eye gaze, 3D point clouds, camera poses, and IMU. It also provides three types of language annotations: expert commentary that critiques the performance, first-person narrations by the participants, and third-person action descriptions.
The key insights and highlights of the dataset are:
Multimodal and multiview capture: Ego-Exo4D provides time-synchronized first-person and third-person video, enabling research on relating and translating between these complementary viewpoints.
Diverse skilled activities: The dataset covers a wide range of physical and procedural skilled activities performed by real-world experts, from sports and dance to cooking and bike repair.
Extensive annotations: In addition to the video and multimodal data, Ego-Exo4D offers rich annotations, including fine-grained activity keysteps, procedural dependencies, proficiency ratings, and 3D body/hand pose.
Benchmark tasks: The dataset introduces four families of benchmark tasks - ego-exo relation, ego(-exo) recognition, ego(-exo) proficiency estimation, and ego pose estimation - to push the frontier of first-person video understanding of skilled human activity.
Ego-Exo4D aims to fuel new research in areas such as egocentric perception, cross-view learning, multimodal activity understanding, and skill assessment. The dataset and all resources have been open-sourced to enable the broader research community.
Naar een andere taal
vanuit de broninhoud
arxiv.org
Belangrijkste Inzichten Gedestilleerd Uit
by Kristen Grau... om arxiv.org 04-30-2024
https://arxiv.org/pdf/2311.18259.pdfDiepere vragen