Ambisonics regularization affects neural network performance.
The study proposes a pipeline for detecting fake environmental sounds using CLAP audio embeddings, achieving high accuracy in identifying deepfake audio.
Developing realistic digital models of dynamic range compressors using deep learning and state-space models.
Enhancing sound event detection with distance estimation for accurate source localization.
The author introduces T2AV-BENCH, a benchmark for text-to-audio generation aligned with videos, and proposes the T2AV model that integrates visual-aligned text embeddings for improved audio synthesis.
The author proposes ASiT, a novel self-supervised learning framework that combines local and global contextual information through group masked model learning and self-distillation to enhance audio representation and achieve state-of-the-art performance in various audio classification tasks.
RFWave introduces a multi-band Rectified Flow approach for high-fidelity audio waveform reconstruction, emphasizing efficiency and quality.
Audio Flamingo introduces a novel audio language model with strong audio understanding, in-context learning, and multi-turn dialogue abilities. The approach sets new benchmarks in various audio tasks.
The author investigates the impact of different regularization methods on Deep Neural Network (DNN) training and performance in Ambisonics networks, highlighting the importance of regularization information.