The paper introduces Mimosa, a human-AI collaborative tool that helps amateur video creators generate and manipulate spatial audio effects for videos with conventional monaural or stereo audio.
Mimosa employs a multi-step audiovisual pipeline to produce useful intermediate results, such as the type and position of independent soundtracks of different sounding objects, and their estimated 3D positions over time. These results are presented through an interactive direct manipulation interface, allowing users to easily validate, fix errors, and further customize the spatial audio effects.
The key features of Mimosa include:
A subjective evaluation with 8 external evaluators shows that the spatial audio effects generated by Mimosa were more immersive than the original video sound while maintaining a high degree of realism. A user study with 15 participants further demonstrates Mimosa's usability, usefulness, and capability in supporting users to create customized spatial audio effects.
Naar een andere taal
vanuit de broninhoud
arxiv.org
Belangrijkste Inzichten Gedestilleerd Uit
by Zheng Ning,Z... om arxiv.org 04-24-2024
https://arxiv.org/pdf/2404.15107.pdfDiepere vragen