Multilingual Natural Language Processing

登入

洞見 - Multilingual Natural Language Processing

Evaluating and Improving Multilingual Large Language Models for Underrepresented Languages

This thesis presents a comprehensive evaluation of multilingual large language models (LLMs) on underrepresented languages, revealing limitations in their multilingual and multicultural generalization. It proposes data-efficient methods to improve the inclusivity and diversity of multilingual LLMs, enabling better performance on underrepresented languages without sacrificing high-resource language capabilities.

Efficient Fine-tuning of Multilingual Neural Machine Translation Models by Exploiting Intrinsic Language-specific Subspaces

Multilingual neural machine translation models can be efficiently fine-tuned by isolating intrinsic language-specific subspaces, leading to significant performance improvements with a much smaller number of trainable parameters.

Multilingual Text Editing Models Trained via Instruction Tuning

We introduce MEDIT, a set of multilingual models capable of performing various text editing tasks like grammatical error correction, text simplification, and paraphrasing across multiple languages by fine-tuning large pre-trained language models via instruction tuning.

Meta4XNLI: A Parallel Corpus for Multilingual Metaphor Detection and Interpretation

Meta4XNLI is a parallel corpus in Spanish and English that provides metaphor annotations for both detection at the token level and interpretation through Natural Language Inference.

Comprehensive Survey on Techniques for Improving Cross-Lingual Alignment in Multilingual Language Models

Cross-lingual alignment, the meaningful similarity of representations across languages in multilingual language models, is crucial for zero-shot cross-lingual transfer. This survey provides a comprehensive overview of techniques to improve cross-lingual alignment, including objectives using parallel data, contrastive learning, modified pre-training schemes, adapter tuning, and data augmentation.

Multilingual Knowledge Editing Benchmark for Large Language Models

MLaKE is a novel benchmark for evaluating the multilingual knowledge editing capabilities of large language models, comprising 5,360 single-hop and 4,072 multi-hop questions across five languages (English, Chinese, Japanese, French, German).

Multilingual Large Language Models: A Comprehensive Survey of Resources, Taxonomy, and Emerging Frontiers

This paper presents a comprehensive survey of the recent progress and emerging trends in multilingual large language models (MLLMs), offering a unified perspective through a novel taxonomy based on alignment strategies.

Uncovering Pervasive Bias in Text Data Across Multiple Languages

Bias exists in text data across multiple languages, including benchmark datasets on the English GLUE/SuperGLUE leaderboards, as well as datasets in Italian, Dutch, German, and Swedish.

Leveraging In-Context Tuning for Efficient One-Shot Cross-Lingual Text Classification

The proposed In-Context Cross-Lingual Transfer (IC-XLT) approach effectively leverages target-language demonstrations during inference to improve cross-lingual text classification performance, especially in scenarios with limited source-language data.

Multilingual Semantic Textual Relatedness: Augmentation and Adaptation for Improved Performance

Our system AAdaM achieves competitive results in the SemEval-2024 Task 1 on Semantic Textual Relatedness for African and Asian languages, by leveraging data augmentation, task-adaptive pre-training, and adapter-based tuning.

關於我們

產品

資源