Early Slavic participle clauses (conjunct participles and dative absolutes) compete with finite temporal clauses (jegda-clauses) in the expression of temporal relations and discourse organization, with the choice between these constructions being motivated by factors such as information structure, discourse relations, and cross-linguistic typological patterns.
Cognate data with synonyms can be effectively represented using probabilistic character matrices to maximize the phylogenetic signal during maximum likelihood tree inference.
Large multilingual language models can effectively represent orthographic and lexical variation across Occitan dialects without the need for extensive data normalization.
Explainable machine learning approaches can provide valuable insights for geolinguistic authorship profiling in forensic linguistics, complementing traditional qualitative methods.
Large Language Models perform poorly on part-of-speech tagging for indigenous and low-resource Brazilian languages compared to high-resource languages, but language adaptation can improve cross-lingual transfer performance.
Morphological means of expressing generic temporal subordination ('when'-clauses) are prevalent in many Latin American and Caribbean languages, posing challenges for token-based typological approaches. This study incorporates character n-gram analysis to capture such morphological markers alongside lexical subordinators, generating probabilistic semantic maps that reveal systematic cross-linguistic variation in the region.
Human conversations exhibit greater variability and authenticity, while ChatGPT demonstrates superior proficiency in social processes, analytical style, cognition, attentional focus, and positive emotional tone.
Voices in French audiovisual media have shown a tendency to lower in pitch over time, independent of gender, while female voices exhibit a decrease in pitch with age that is not observed in male voices.
This paper introduces the Proto-Italic to Latin (PILA) dataset, which contains approximately 3,000 pairs of forms from Proto-Italic and Latin, to assist historical linguists in the study of Italic sound change.
Computational models can automate and improve the efficiency of proto-language reconstruction, a painstaking process for linguists. This work explores three approaches to enhance previous methods, including data augmentation, a VAE-based Transformer model, and a Variational Neural Machine Translation model.