Advancing Speech Translation: A New Corpus of Mandarin-English Conversational Telephone Speech for Improving Machine Translation Performance
The availability of high-quality, in-domain training data is essential for developing effective speech translation systems. This paper introduces a new corpus of 123.5 hours of Mandarin-English conversational telephone speech, which can significantly improve the performance of speech translation models compared to using general-purpose translation models.