Enhancing Audio Generation Diversity with Visual Information
The author aims to improve audio generation diversity within specific categories by incorporating visual information, utilizing a clustering-based method. This approach enhances the quality and diversity of generated audios significantly.