We introduce MEDIT, a set of multilingual models capable of performing various text editing tasks like grammatical error correction, text simplification, and paraphrasing across multiple languages by fine-tuning large pre-trained language models via instruction tuning.
Meta4XNLI is a parallel corpus in Spanish and English that provides metaphor annotations for both detection at the token level and interpretation through Natural Language Inference.
Cross-lingual alignment, the meaningful similarity of representations across languages in multilingual language models, is crucial for zero-shot cross-lingual transfer. This survey provides a comprehensive overview of techniques to improve cross-lingual alignment, including objectives using parallel data, contrastive learning, modified pre-training schemes, adapter tuning, and data augmentation.
MLaKE is a novel benchmark for evaluating the multilingual knowledge editing capabilities of large language models, comprising 5,360 single-hop and 4,072 multi-hop questions across five languages (English, Chinese, Japanese, French, German).
This paper presents a comprehensive survey of the recent progress and emerging trends in multilingual large language models (MLLMs), offering a unified perspective through a novel taxonomy based on alignment strategies.
Bias exists in text data across multiple languages, including benchmark datasets on the English GLUE/SuperGLUE leaderboards, as well as datasets in Italian, Dutch, German, and Swedish.
The proposed In-Context Cross-Lingual Transfer (IC-XLT) approach effectively leverages target-language demonstrations during inference to improve cross-lingual text classification performance, especially in scenarios with limited source-language data.
Our system AAdaM achieves competitive results in the SemEval-2024 Task 1 on Semantic Textual Relatedness for African and Asian languages, by leveraging data augmentation, task-adaptive pre-training, and adapter-based tuning.
Leveraging crowdsourcing and automatic translation to expand the coverage and scale of news framing analysis beyond English, while demonstrating the effectiveness of combining expert-annotated and crowd-sourced data.
Low-Rank Adaptation (LoRA) is a competitive alternative to full fine-tuning for multilingual summarization, particularly in low-data and cross-lingual transfer scenarios.