본 논문에서는 실제 회의 환경에서 원거리 마이크를 사용한 음성 인식 성능을 향상시키기 위해 빔포밍과 화자 속성 기반 자동 음성 인식(SA-ASR)을 결합한 새로운 접근 방식을 제안합니다.


coremsg

실제-원거리-마이크-회의록-작성을-위한-공동-빔포밍-및-화자-속성-기반-자동-음성-인식


실제 원거리 마이크 회의록 작성을 위한 공동 빔포밍 및 화자 속성 기반 자동 음성 인식


title_rewrite


The proposed overlapped encoding separation (EncSep) and single-speaker information guidance serialized output training (GEncSep) methods improve the performance of multi-speaker automatic speech recognition by effectively utilizing the benefits of the connectionist temporal classification (CTC) and attention hybrid loss.


improving-serialized-output-training-for-multi-speaker-automatic-speech-recognition-through-overlapped-encoding-separation-and-single-speaker-information-guidance


Improving Serialized Output Training for Multi-Speaker Automatic Speech Recognition through Overlapped Encoding Separation and Single-Speaker Information Guidance