Conceptos Básicos
A computationally efficient and distributed speaker diarization framework for networked IoT-style audio devices using Federated Learning.
Resumen
The paper presents a Federated Learning-based speaker diarization mechanism for distributed audio-recording devices/IoTs. It proposes a novel client device grouping method for federated model aggregation and employs unsupervised distance-based Bayesian methods, namely Bayesian Information Criterion (BIC) and Hotelling's t-squared statistic (t²-statistic), for speaker segmentation and clustering.
The key highlights are:
- The use of t²-statistic for speaker segmentation reduces computational complexity compared to BIC, while maintaining similar accuracy.
- The segmentation focuses on quasi-silences to reduce false detections without compromising missed detections.
- An online update method for the federated learning model is employed based on cosine similarity of speaker embeddings.
- The proposed framework is evaluated with real-world audio conversations and demonstrates performance comparable to centrally trained models, even in the absence of IID audio data availability and a priori training at the audio recording IoT devices.
Estadísticas
The proposed diarization mechanism can achieve an F-score accuracy of up to 85% for speaker change detection.
The t²-statistic-based segmentation method exhibits a 3-8% improvement in F-score accuracy compared to the BIC-based method.
The t²-statistic-based segmentation achieves a coverage improvement of around 3% and a purity improvement of around 5% compared to the BIC-based method.
Citas
"The proposed diarization mechanism deals with such unknown distributed processing environments using unsupervised segmentation and federated learning."
"The advantages of using t²-statistic as compared to other statistical methods in terms of segmentation accuracy and computational rigor is analyzed."
"The proposed framework is functionally verified and experimentally evaluated with real-world audio conversations from zoom meetings and online sources including podcasts, YouTube, etc."