This paper presents a novel approach to leveraging neural models to enhance the efficiency of linguistic fieldwork, with a focus on the collection of morphological data.
The key highlights and insights are:
The authors introduce a framework that evaluates the effectiveness of various sampling strategies for obtaining morphological data and assesses the ability of state-of-the-art neural models to generalize morphological structures.
The experiments highlight two key strategies for improving the efficiency of the data collection process:
The results show that uniform random sampling across paradigm cells leads to more representative data and better generalization, outperforming strategies that prioritize the completion of full paradigms or focus on the most confident predictions.
The authors also introduce a new metric, the Normalized Efficiency Score, to better capture the efficiency of the elicitation process by considering the number of interactions with the speaker and the accuracy of the final model.
The study examines a range of typologically diverse languages, providing insights into the effectiveness of the proposed approach across different morphological systems and data availability conditions.
Overall, this work demonstrates how neural models can be leveraged to guide linguists during fieldwork, making the process of data collection more efficient and informative.
Naar een andere taal
vanuit de broninhoud
arxiv.org
Belangrijkste Inzichten Gedestilleerd Uit
by Aso Mahmudi,... om arxiv.org 09-24-2024
https://arxiv.org/pdf/2409.14628.pdfDiepere vragen