Language models exhibit structural priming effects that can be explained by inverse frequency effects, such as prime surprisal and verb preference, as well as lexical dependence between prime and target.
Online fan communities, such as fanfiction and forums, collaboratively reconstruct and renegotiate narrative elements like characters, leading to divergent representations from the original source material.
The realization of pitch contours of monosyllabic words in spontaneous Taiwan Mandarin speech is co-determined by tonal context, word identity, and word sense.
Pretrained multilingual language models do not encode abstract cross-linguistic syntactic structures, but rather rely on shallow language-specific cues.
Early Slavic participle clauses (conjunct participles and dative absolutes) compete with finite temporal clauses (jegda-clauses) in the expression of temporal relations and discourse organization, with the choice between these constructions being motivated by factors such as information structure, discourse relations, and cross-linguistic typological patterns.
Cognate data with synonyms can be effectively represented using probabilistic character matrices to maximize the phylogenetic signal during maximum likelihood tree inference.
Large multilingual language models can effectively represent orthographic and lexical variation across Occitan dialects without the need for extensive data normalization.
Explainable machine learning approaches can provide valuable insights for geolinguistic authorship profiling in forensic linguistics, complementing traditional qualitative methods.
Large Language Models perform poorly on part-of-speech tagging for indigenous and low-resource Brazilian languages compared to high-resource languages, but language adaptation can improve cross-lingual transfer performance.
Morphological means of expressing generic temporal subordination ('when'-clauses) are prevalent in many Latin American and Caribbean languages, posing challenges for token-based typological approaches. This study incorporates character n-gram analysis to capture such morphological markers alongside lexical subordinators, generating probabilistic semantic maps that reveal systematic cross-linguistic variation in the region.