Core Concepts
While computationally efficient and effective for in-domain tasks, the SLAM-ASR architecture suffers from significant fragility to domain shifts, speech perturbations, and potentially unreliable speech-to-text alignment when the LLM is not fine-tuned.
Kumar, S., Thorbecke, I., Burdisso, S., Villatoro-Tello, E., E, M. K., Hacioglu, K., ... & Stolcke, A. (2024). Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward. arXiv preprint arXiv:2411.03866.
This paper investigates the robustness and limitations of SLAM-ASR, a recent architecture for Large Language Model (LLM)-based Automatic Speech Recognition (ASR), to determine its suitability as a general-purpose ASR solution.