Large language models exhibit significant limitations in handling simple linguistic inferences that are trivial for humans, including grammatically-specified entailments, monotonicity entailments, and inferences involving evidential adverbs of uncertainty.
The study explores the impact of linguistic typology on the performance of cross-lingual transfer learning for event extraction tasks, using Basque as the target language.
Sarcasm detection models fine-tuned on specific datasets struggle to generalize to other datasets, highlighting the need for more diverse and representative sarcasm data to build robust sarcasm detection systems.
Foundational models can be used to generate dictionary example sentences that outperform existing expert-curated examples, by leveraging a novel method to identify sentences that best exemplify the meaning of words.
This paper introduces the Sinhala Offensive Language Dataset (SOLD), the largest annotated dataset for detecting offensive content in the Sinhala language. The dataset contains 10,000 tweets annotated at both the sentence and token level, enabling the development of explainable models for offensive language identification.
Annotation guidelines for MaiBaam corpus.
Pre-trained language models effectively generalize to code-switched text, revealing insights into their capabilities.
법적 텍스트 함의 인식에 대한 VLSP 2023 - LTER 챌린지 요약
BLADE introduces a novel framework to enhance large language models with domain-specific models, significantly improving performance in vertical domains.
Developing effective systems for detecting offensive language in Chinese poses unique challenges due to cultural nuances and linguistic complexities.