The paper discusses the challenges of data imbalance in named entity recognition (NER) tasks, particularly in natural language processing. It introduces a novel learning method called majority or minority (MoM) learning to tackle this issue effectively. MoM learning focuses on incorporating the loss computed for samples belonging to the majority class into the conventional ML model's loss. By doing so, it aims to prevent misclassifications of minority classes as the majority class, thereby enhancing prediction performance without sacrificing accuracy. The study evaluates MoM learning on four NER datasets in Japanese and English, demonstrating consistent performance improvements across different languages and frameworks.
The content delves into the notation used for sequential labeling in NER tasks and explains how MoM learning functions by adding the loss associated with the majority class to the conventional loss function. It simplifies weight adjustments compared to other methods like weighted cross-entropy and focal loss. The experiments conducted show that MoM learning outperforms existing methods, including state-of-the-art techniques like focal loss and dice loss, across various datasets and frameworks.
Furthermore, the paper highlights the importance of focusing on entity classes' performance rather than just overall scores, emphasizing practical significance. It also discusses challenges faced by traditional weighting methods like weighted cross-entropy when applied to multiclass NER tasks with long-tail distributions. The results showcase MoM learning's effectiveness in both sequential labeling and machine reading comprehension frameworks, indicating its adaptability and superior performance.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Sota Nemoto,... at arxiv.org 03-19-2024
https://arxiv.org/pdf/2401.11431.pdfDeeper Inquiries