Kernkonzepte
SALMON, a novel evaluation suite, comprehensively assesses speech language models' ability to capture various acoustic aspects, including background noise, emotion, speaker identity, and room acoustics, going beyond just the spoken content.
Zusammenfassung
The authors introduce SALMON, a comprehensive evaluation suite for assessing the acoustic awareness of speech language models (SLMs). SALMON consists of two main tasks:
Acoustic Consistency: Evaluating whether SLMs can detect unnatural acoustic changes within a recording, such as changes in speaker, sentiment, background noise, or room acoustics.
Acoustic-Semantic Alignment: Assessing whether SLMs can align the acoustic properties of a recording (e.g., background noise, sentiment) with the semantic content of the spoken text.
SALMON covers a wide range of acoustic elements, including speaker identity, sentiment, background noise, and room impulse response. The benchmark uses a modeling-based approach, where the SLM is evaluated on its ability to assign higher likelihood to "real" samples compared to modified, inconsistent samples.
The authors evaluate several popular SLMs, including TWIST, LAST, and pGSLM, on the SALMON benchmark. The results show that while humans easily achieve over 90% accuracy on most tasks, current SLMs struggle to model and identify basic acoustic inconsistencies, highlighting the need for further research in developing acoustically aware speech language models.
The authors provide the SALMON evaluation suite and generation pipeline, aiming to guide future SLM development towards jointly modeling semantic and acoustic aspects of speech.
Statistiken
The following sentences contain key metrics or important figures used to support the authors' key logics:
"We evaluated several SLMs using SALMON and discuss the insights in Sec. V. We show that while humans easily achieve over 90% on most tasks, SLMs struggle in modelling and identifying basic acoustic inconsistencies."
"We evaluated the performance of popular SLMs on the different parts of SALMON. Through this we evaluate the impact of different model aspects, such as number of parameters and expressive modelling approaches."
Zitate
"SALMON, a Suite for Acoustic Language Model Evaluation"
"We show that while humans easily achieve over 90% on most tasks, SLMs struggle in modelling and identifying basic acoustic inconsistencies."
"We hope that publishing this benchmark and sample generation pipeline will progress the development of acoustic aware SLMs."