The paper introduces Mimosa, a human-AI collaborative tool that helps amateur video creators generate and manipulate spatial audio effects for videos with conventional monaural or stereo audio.
Mimosa employs a multi-step audiovisual pipeline to produce useful intermediate results, such as the type and position of independent soundtracks of different sounding objects, and their estimated 3D positions over time. These results are presented through an interactive direct manipulation interface, allowing users to easily validate, fix errors, and further customize the spatial audio effects.
The key features of Mimosa include:
A subjective evaluation with 8 external evaluators shows that the spatial audio effects generated by Mimosa were more immersive than the original video sound while maintaining a high degree of realism. A user study with 15 participants further demonstrates Mimosa's usability, usefulness, and capability in supporting users to create customized spatial audio effects.
לשפה אחרת
מתוכן המקור
arxiv.org
תובנות מפתח מזוקקות מ:
by Zheng Ning,Z... ב- arxiv.org 04-24-2024
https://arxiv.org/pdf/2404.15107.pdfשאלות מעמיקות