The NURC-SP Audio Corpus is a new freely available dataset of spontaneous speech in Brazilian Portuguese, focusing on the paulistano (São Paulo city) accent. It contains 239.30 hours of transcribed audio recordings from 401 different speakers (204 females, 197 males).
The corpus was created by digitizing and transcribing audio recordings from the NURC-SP project, which documented the urban linguistic norm of educated speakers in São Paulo in the 1970s. The transcriptions were initially generated automatically using the WhisperX model and then manually revised by 14 native Brazilian Portuguese speakers.
Four automatic speech recognition (ASR) models were evaluated on the NURC-SP Audio Corpus:
The results show that the Distil-Whisper fine-tuned model achieved the best performance with a word error rate (WER) of 24.22%, followed by the fine-tuned Wav2Vec2-XLSR-53 model with a WER of 33.73%. These results indicate that the NURC-SP Audio Corpus is a challenging dataset for ASR, and the Distil-Whisper model shows promise for low and medium resource languages like Brazilian Portuguese.
The NURC-SP Audio Corpus and the trained ASR models are publicly available to enable further research and development in this area.
To Another Language
from source content
arxiv.org
Djupare frågor