Audio Processing

Inloggen

inzicht - Audio Processing

Leveraging Bi-modal Semantic Similarity for Weakly-supervised Audio Source Separation

The proposed framework leverages bi-modal semantic similarity between audio and language modalities to generate weak supervision signals for single-source audio extraction, without requiring access to single-source audio samples during training.

ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds

ACES introduces a novel metric approach for evaluating automated audio captioning systems based on the semantics of sounds.

Speaker Distance Estimation in Enclosures from Single-Channel Audio

Proposing a novel approach for continuous distance estimation from audio signals using a convolutional recurrent neural network with an attention module.

CATSE: Context-Aware Framework for Causal Target Sound Extraction

Context-aware models improve real-time target sound extraction performance.

Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures

Combining Instance Discrimination with Masked Autoencoders through uaMix-MAE enhances downstream task performance with limited labeled data.

First-Shot Unsupervised Anomalous Sound Detection Framework with Metadata-Assisted Audio Generation

The author proposes a new framework for first-shot unsupervised anomalous sound detection using metadata-assisted audio generation to estimate unknown anomalies, achieving competitive performance in the DCASE 2023 Challenge Task 2.

Evaluation of Autoregressive Methods for Audio Inpainting

The author evaluates autoregressive audio inpainting methods, highlighting the importance of AR model estimators and model order in achieving high-quality results.

CrossNet: Complex Spectral Mapping for Speaker Separation

CrossNet introduces a novel DNN architecture for speaker separation, leveraging global and local information to enhance performance in noisy-reverberant environments.

Over ons

Producten

Bronnen