The content discusses the importance of non-verbal signals in speech, focusing on prosody. It presents an analytical schema and a classification process to interpret multi-layered prosodic events. The study aims to enhance speech technologies by formalizing prosody and shedding light on communication theories.
Non-verbal signals in speech are crucial, ranging from conversation actions to emotions. The principles governing prosodic structuring remain unclear due to the simultaneous nature of these signals. Recent developments in pattern recognition offer opportunities for understanding complex prosodic structures.
The study proposes a schema that interprets surface representations of multi-layered prosodic events. By fine-tuning a pre-trained model, it disentangles different orders of prosodic phenomena simultaneously. This method performs comparably or better than human annotation on various types of data.
In addition to formalizing prosody, understanding its patterns can contribute to communication theories and improve language technologies. Disentangling prosodic patterns can help identify constraints affecting speech organization and minimize disparities in acoustic descriptions.
The research also demonstrates the ability to add prosodic labels to aligned transcriptions using transfer learning methods. By re-training models like WHISPER, the study shows promise in decoding complex prosodic structures efficiently.
На другой язык
из исходного контента
arxiv.org
Дополнительные вопросы