Huang, W.-C., Cooper, E., & Toda, T. (2015). MOS-Bench: Benchmarking Generalization Abilities of Subjective Speech Quality Assessment Models. JOURNAL OF LATEX CLASS FILES, 14(8), 1–11.
This paper addresses the challenge of limited generalization ability in deep neural network (DNN)-based subjective speech quality assessment (SSQA) models. The authors aim to provide a standardized, large-scale benchmark for evaluating and improving the performance of SSQA models on diverse, unseen datasets.
The authors introduce MOS-Bench, a collection of seven training and twelve test datasets encompassing various speech types, languages, sampling frequencies, and distortion types. They also present SHEET, an open-source toolkit with implementations of several DNN-based SSQA models, including SSL-MOS and a modified AlignNet. The authors conduct experiments on single and multiple dataset training, employing conventional metrics like MSE, LCC, and SRCC, as well as their proposed best score difference/ratio metric to assess model performance and generalization ability. Additionally, they visualize SSL embeddings to analyze model behavior.
The authors conclude that MOS-Bench and SHEET provide valuable resources for benchmarking and improving the generalization ability of SSQA models. They highlight the potential of training on diverse, non-synthetic datasets and using non-parametric inference methods for enhanced faithfulness.
This research significantly contributes to the field of SSQA by providing a standardized benchmark and toolkit for evaluating and enhancing the generalization ability of DNN-based models. The findings have implications for future research directions, including exploring the use of diverse, non-synthetic datasets and non-parametric inference methods.
The study is limited by the specific datasets and models used. Future research could explore the inclusion of more diverse datasets and investigate the effectiveness of other SSQA models and training techniques. Additionally, further investigation into the corpus effect and the impact of dataset size on generalization ability is warranted.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Wen-Chin Hua... at arxiv.org 11-07-2024
https://arxiv.org/pdf/2411.03715.pdfDeeper Inquiries