toplogo
Zaloguj się

Phonetic Alignments and Measurements of the Diverse UCLA Phonetics Lab Archive


Główne pojęcia
The UCLA Phonetics Lab Archive, one of the earliest and most diverse multilingual speech corpora, has been phonetically aligned and augmented with acoustic-phonetic measurements to enhance its usability for research and applications in speech technologies and comparative linguistics.
Streszczenie
The UCLA Phonetics Lab Archive is one of the earliest and most crosslinguistically diverse collections of speech data, containing recordings and phonetic transcriptions from 314 languages. However, the transcriptions were not previously time-aligned to the audio recordings, limiting the corpus's usability for many research questions. The authors present VoxAngeles, an updated release of the UCLA Phonetics Lab Archive that includes: Manually corrected phone-level alignments using the original or adapted phonetic transcriptions for 95 languages. Phonetic measurements of phone and word durations, vowel formants, and vowel f0. The manual alignment process involved addressing various issues, such as inconsistent representation of suprasegmental features, obsolete/nonstandard symbols, typographical errors, unclear segment boundaries, and mismatches between transcripts and audio. The current release of VoxAngeles spans 95 languages from 21 language families. The authors demonstrate the utility of this corpus for phonetic typology through a case study on vowel intrinsic f0, which shows general support for the crosslinguistic effect but also some variability across individual speakers and languages. The VoxAngeles corpus is freely available under a CC-BY-NC 4.0 license and serves as a foundational resource for investigations in phonetic typology, as well as for low-resource and multilingual speech technologies.
Statystyki
The manual alignment process yielded a total of 5,445 word-level recordings from 95 languages, with a median of 49 recordings and a range of 20 to 162 recordings per language. Within these files, a total of 22,825 phone intervals were aligned, with a median of 228 and a range of 46 to 755 phone intervals per language. 568 distinct phones were observed across the corpus, with the number of distinct phone types per language ranging from 13 to 93 with a median of 35.
Cytaty
"The usability of a speech corpus depends considerably on the research question. The mere existence of speech audio data may suffice for some research questions, whereas for others, metadata may be necessary for downstream analysis." "Access to crosslinguistic speech corpora has risen dramatically in recent years, particularly with the release of several massively multilingual speech corpora." "The UCLA Phonetics Lab Archive is one of the earliest, and to this day, one of the most crosslinguistically diverse collections of speech data."

Głębsze pytania

How can the VoxAngeles corpus be further expanded and diversified to include more languages and speakers?

To further expand and diversify the VoxAngeles corpus, several steps can be taken. Firstly, additional data can be extracted from the UCLA Phonetics Lab Archive for languages that are currently underrepresented. This can involve manual alignment and auditing of phonetic transcriptions for more languages, similar to the process outlined in the paper. Moreover, efforts can be made to include recordings from multiple speakers per language to account for speaker variability. Longer passages of spoken data can also be extracted to provide a more comprehensive dataset for analysis. Collaborations with researchers and fieldworkers working on endangered or low-resource languages can help in sourcing new data and expanding the corpus's language coverage.

What other phonetic universals or typological patterns could be investigated using the acoustic-phonetic measurements provided in the VoxAngeles corpus?

The acoustic-phonetic measurements provided in the VoxAngeles corpus can be used to investigate various phonetic universals and typological patterns across languages. One interesting area of study could be the investigation of consonant-vowel interactions, such as coarticulatory effects or patterns of consonant lenition or fortition. Additionally, the corpus can be utilized to explore prosodic features like tone or stress patterns across different language families. The measurements of vowel formants and f0 can also be used to study vowel harmony systems or patterns of vowel reduction in different languages. Furthermore, the corpus can facilitate research on phonetic coarticulation and its crosslinguistic variability.

How can the VoxAngeles corpus be leveraged to improve multilingual and low-resource speech technologies, such as automatic speech recognition and text-to-speech synthesis?

The VoxAngeles corpus can play a crucial role in enhancing multilingual and low-resource speech technologies by providing a diverse and audited dataset for training and testing speech recognition and synthesis systems. The phonetic alignments and measurements in the corpus can be used to develop language-specific acoustic models for automatic speech recognition, improving the accuracy and performance of systems for underrepresented languages. Additionally, the corpus can aid in the development of multilingual speech recognition systems by providing data for crosslinguistic phonetic analysis and modeling. For text-to-speech synthesis, the corpus can be used to train models for generating natural-sounding speech in various languages, especially those with limited resources. By leveraging the VoxAngeles corpus, researchers and developers can advance the capabilities of speech technologies for a wider range of languages and dialects.
0
visual_icon
generate_icon
translate_icon
scholar_search_icon
star